C-KAN: A New Approach for Integrating Convolutional Layers with Kolmogorov–Arnold Networks for Time-Series Forecasting

Livieris, Ioannis E.

doi:10.3390/math12193022

Open AccessArticle

C-KAN: A New Approach for Integrating Convolutional Layers with Kolmogorov–Arnold Networks for Time-Series Forecasting

by

Ioannis E. Livieris

Department of Statistics & Insurance Science, University of Piraeus, GR 18532 Piraeus, Greece

Mathematics 2024, 12(19), 3022; https://doi.org/10.3390/math12193022

Submission received: 16 August 2024 / Revised: 25 September 2024 / Accepted: 26 September 2024 / Published: 27 September 2024

(This article belongs to the Special Issue Advanced Information and Signal Processing: Models and Algorithms)

Download

Browse Figures

Versions Notes

Abstract

:

Time-series forecasting represents of one of the most challenging and widely studied research areas in both academic and industrial communities. Despite the recent advancements in deep learning, the prediction of future time-series values remains a considerable endeavor due to the complexity and dynamic nature of time-series data. In this work, a new prediction model is proposed, named C-KAN, for multi-step forecasting, which is based on integrating convolutional layers with Kolmogorov–Arnold network architecture. The proposed model’s advantages are (i) the utilization of convolutional layers for learning the behavior and internal representation of time-series input data; (ii) activation at the edges of the Kolmogorov–Arnold network for potentially altering training dynamics; and (iii) modular non-linearity for allowing the differentiated treatment of features and potentially more precise control over inputs’ influence on outputs. Furthermore, the proposed model is trained using the DILATE loss function, which ensures that it is able to effectively deal with the dynamics and high volatility of non-stationary time-series data. The numerical experiments and statistical analysis were conducted on five challenging non-stationary time-series datasets, and provide strong evidence that C-KAN constitutes an efficient and accurate model, well suited for time-series forecasting tasks.

Keywords:

convolutional layers; Kolmogorov–Arnold networks; forecasting; non-stationarity; time series

MSC:

68T07

1. Introduction

Time-series forecasting plays an integral role in many real-world industrial applications ranging from healthcare [1] and finance [2] to energy management [3] and agriculture [4]. The difficulty of this task is based on the complexity of time-series data, which are characterized by high volatility, noise and extreme directional movements. Nevertheless, the stationarity property [5] probably constitutes the most crucial role for the development of an accurate and reliable forecasting model, since it highly affects the performance of time-series prediction models. The major drawback when dealing with non-stationary time series is that the statistical properties of the series (such as mean, variance, etc.) change over time; hence, they are very difficult to model or forecast, since the model may indicate relationships between the variables which do not actually exist. As a result, the development of an intelligent model which is able to deal with non-stationary, noisy and highly volatile time-series data is one of the most challenging and complex prediction problems in the machine learning area [6].

In general, the development of a forecasting model is based on exploiting accumulated historical time-series data. Suppose that

{y_{i}}_{i = 1}^{n}

is a sequence of n observations of a time series. The major goal in multi-step time-series forecasting is the development of a prediction model f, which is defined by

{\hat{y}}_{t + h}, {\hat{y}}_{t + h - 1}, \dots, {\hat{y}}_{t + 1}, = f (y_{t}, y_{t - 1}, \dots, y_{t - k + 1}; θ) + ϵ_{t}

(1)

where

k \in N^{*}

is the number of past observations of the series (look-back window),

h \in N^{*}

is the future time steps (forecasting horizon),

θ

is the vector of the model’s parameters and

ϵ_{t}

is the white noise residual at step t. For defining model f in Equation (1), a variety of approaches have been proposed in the literature, which range from statistical-based models such as the AutoRegressive Integrated Moving Average [7] to sophisticated machine learning models, which range from traditional multi-layer perceptrons to convolutional-based networks and transformer-based models [8,9,10,11,12,13,14]. However, most of these models frequently do not possess the ability to efficiently model complex, noisy and non-stationary time-series data and be effective, since they cannot depict the stochastic nature and high volatility of time series [6,15,16,17].

Quite recently, Liu et al. [18] revolutionized the area of neural networks and deep learning by introducing Kolmogorov–Arnold networks (KAN) as a promising alternative to the well-established multi-layer perceptrons (MLP). This new class of neural networks was inspired by the Kolmogorov–Arnold representation theorem [19], and the major novelty is the learnable activation functions on the network’s weights in contrast to traditional MLPs, which possess fixed activation functions on neurons. Specifically, a KAN model possesses no linear weights, since each weight is actually substituted by a univariate spline function. The authors theoretically proved that KANs possess faster neural scaling laws than MLPs and provided empirical results showcasing that KANs outperform MLPs in terms of accuracy. Along this line, this new type of model has been evaluated on time-series forecasting problems [20], providing some interesting results.

In this work, a new model is proposed, named C-KAN, for accurate and reliable time-series forecasting, which is based on the integration of convolutional layers into a Kolmogorov–Arnold network architecture, as well as the utilization of the DILATE loss function [8] for handling non-stationarity. The primary idea is to exploit convolutional layers’ capability for feature extraction in order to provide higher-quality data than the original inputs to the Kolmogorov–Arnold network, thereby enhancing their remarkable prediction ability. Furthermore, the proposed model is trained using the DILATE function, which ensures that it is able to effectively deal with non-stationary time series. The notable advantage of the proposed approach is that it could lead to the development of effective and robust time-series forecasting models due to its special architectural design, being capable of handling the complexity and dynamic volatility of time-series forecasting tasks. Specifically, C-KAN possesses four main characteristics: (i) convolutional layers, which are dedicated to learning the behavior and internal representation of time-series input data; (ii) activation at the edges of the Kolmogorov–Arnold network for potentially altering training dynamics; (iii) modular non-linearity for allowing the differentiated treatment of features and ultimately more precise control over inputs’ influence on outputs; and (iv) the DILATE loss function for supporting precise shape and temporal change detection occurring in non-stationary data. The performance of the proposed C-KAN model is compared against state-of-the-art forecasting models on five well-known and challenging non-stationary time-series datasets, which are characterized by high volatility as well as sudden and sharp changes. The detailed experimental analysis reveals that the proposed model outperforms traditional models, while the reported statistical analysis ensures the robustness and effectiveness of the C-KAN model. Additionally, an ablation study is conducted for examining the sensitivity of the proposed model’s performance to different configurations. In summary, the main contributions of this research are as follows:

A new prediction model, named C-KAN, is proposed for multi-step time-series forecasting, which is based on the employment of a Kolmogorov–Arnold network on the top of a series of convolutional layers, as well as the use of the DILATE loss function for handling sharp changes in target values and the high volatility of non-stationary time series.
An extensive experimental evaluation is conducted on five complex and non-stationary time-series datasets, which are characterized by considerable fluctuations and pronounced instability, for providing strong empirical and statistical evidence about the effectiveness of the proposed model.
A number of examples are provided which demonstrate the proposed C-KAN model’s ability to capture complex temporal patterns and generate more accurate forecasts than traditional state-of-the-art models.

The remainder of this paper is organized as follows: Section 2 presents a brief review on state-of-the-art forecasting models for time-series prediction. Section 3 presents a detailed description of the proposed C-KAN model, paying special attention to its main characteristics and advantages. Section 4 presents the comprehensive numerical experiments as well as the evaluation and statistical analysis. Finally, Section 5 discusses the main findings of the proposed approach and its limitations and proposes some interesting directions for future work.

2. Related Work

Time-series forecasting constitutes one of the most challenging and complex research areas in machine learning, which has been successfully employed for dealing with many diverse real-world application benchmarks [1,2,4]. The difficulty arises from the complexity of time-series data, which are frequently characterized by high volatility, significant fluctuations and non-stationarity. In addition, their internal complex dynamics, compounded by various influencing factors, further increase the difficulty of generating reliable forecasts. During the last decade, a number of rewarding studies were published introducing robust deep learning models for effectively and accurately forecasting time series. In the next section, the most representative studies in this field are presented.

Woo et al. [9] proposed a meta-optimization framework, named DeepTime, for addressing the drawbacks of deep learning time-index models for time-series forecasting. The proposed framework splits the time-index-based learning process into two distinct stages: (i) the inner learning process, which operates as the traditional supervised learning process, and (ii) the outer learning process, which enhances extrapolation by enabling the time-index model to learn strong inductive bias. Additionally, the authors emphasized the ability of the DeepTime model to learn high-frequency time-series patterns using a novel concatenated Fourier-based module. The presented numerical experiments showed that DeepTime exhibited competitive and sometimes superior performance compared to more traditional deep learning time-series models with more complex architectures on real world stationary and non-stationary benchmarks. Finally, the authors stated that a limitation of their work is that fact that DeepTime does not take into consideration events and holidays, similar to other time-index models.

Le Guen and Thome [8] introduced a new loss function, named DIstortion Loss including shApe and TimE (DILATE), for training deep learning time-series models. The primary aim of DILATE is to efficiently handle non-stationary data by accurately predicting sudden changes through the incorporation of two terms, which focus on supporting precise shape and temporal change detection. Furthermore, an important advantage of DILATE is that it is agnostic to the selection of the neural network model. For providing empirical evidence about the efficiency of the proposed approach, the authors compared the performance of a Sequence-to-Sequence GRU-based (Seq2Seq) time-series model, which was trained with DILATE and with a variety of traditional loss functions. The experimental analysis revealed that DILATE is able to effectively increase the performance of Seq2Seq time-series models. Nevertheless, a limitation of this work is that the effect of DILATE loss on the performance of a time-series model was evaluated for only one model.

Oreshkin et al. [10] proposed a new deep neural architecture model, named N-BEATS, for univariate time-series forecasting. The proposed model was based on a deep stack of fully connected dense layers and on backward and forward residual links. The considerable advantages of their approach was that N-BEATS is interpretable, fast to train and applicable without modifications to a large variety of target areas. N-BEATS was evaluated against state-of-the-art forecasting models on several non-stationary datasets, including the datasets from M3, M4 and TOURISM competitions. The main finding of the numerical experiments was that N-BEATS is able to outperform traditional models, since it was able to provide more accurate and reliable forecasts. However, N-BEATS does not inherently account for interdependencies within the time-series, which implies that it is not able to effectively capture the complex temporal dynamics [21]. In addition, another limitation is that N-BEATS utilizes mean-square-error (MSE)-based loss, which implies that the model struggles to process non-stationary time-series data.

Livieris and Pintelas [11] proposed a new sophisticated algorithmic framework for enhancing the performance of a deep learning model for forecasting non-stationary time-series. The main idea is to improve the quality of the input training data based on the application of a series of data transformations for ensuring the stationarity property by taking into consideration the sampling and dynamics of series. Then, the transformed “high-quality” data are used for fitting and training a deep learning model. The authors conducted an exhaustive experimental analysis studying the performance of two convolutional-based models, which was based on forecasting ability evaluation, directional movement prediction evaluation and forecast reliability evaluation by examining the existence of autocorrelation in the models’ residuals. The presented analysis provided theoretical and empirical evidence about the effectiveness of the proposed framework for enhancing the performance of a deep learning model. Finally, the authors stated that their future work will be concentrated on studying the effect of their proposed approach on several deep learning models with different characteristics, which was noted as a limitation of their research.

Zeng et al. [12] presented an excellent study on the effectiveness of transformer-based models for time-series forecasting. The basic finding of their research was that these models suffer from considerable error accumulation effects, especially in long-term forecasting problems. In addition, the authors proposed a simple forecasting model, named DLinear, which constitutes a combination of the decomposition scheme utilized in transformer models [13,14] with linear layers. DLinear was evaluated against state-of-the-art transformer models, Informer [22], AutoFormer [13], PyraFormer [23] and FEDFormer [14], on nine widely used real-world datasets, providing the best-overall performance. Based on their extensive experimental analysis, the authors concluded that transformers are models with high computational cost and in many cases exhibit inferior performance to a simple model for long-term forecasting. However, although DLinear was characterized as a powerful model in many studies, its feature extraction ability is limited due to its linear-based structure [24].

In this work, a new prediction model is proposed for multi-step time-series forecasting, named C-KAN, which is based on the employment of convolutional layers in a Kolmogorov–Arnold network architecture. The considerable advantages of the proposed model’s architecture are that it is able to efficiently learn the behavior and internal representation of time-series data as well as allowing the differentiated treatment of features and potentially more precise control over inputs’ influence on outputs. Furthermore, the proposed model is trained using the DILATE loss function [8], which ensures that it is able to effectively deal with the dynamics and high volatility of non-stationary time-series data. The performance of the proposed C-KAN model is compared against state-of-the-art forecasting models on five challenging time-series datasets. This comprehensive experimental analysis provides empirical and statistical evidence about the robustness of the C-KAN model, illustrating the efficacy and effectiveness of the proposed forecasting model.

3. C-KAN Time-Series Forecasting Model

In this section, a detailed description of the proposed C-KAN (convolutional-based Kolmogorov–Arnold network) model is provided for multi-step time-series forecasting, which constitutes the main contribution of this research. Special attention is paid to the presentation of its main characteristics and advantages.

3.1. Model’s Architecture

Figure 1 presents a high-level description of C-KAN’s model architecture design for time-series forecasting, which is based on the employment of a Kolmogorov–Arnold network on the top of a series of convolutional layers. The rationale behind the proposed approach is to exploit convolutional layers’ ability to extract useful knowledge and learn the internal representation of the time-series data, together with the effectiveness of Kolmogorov–Arnold networks in modeling complex data patterns with non-linear relationships, as well as the use of DILATE loss for precise shape and temporal change detection. In simple terms, convolutional layers are used for feature extraction from the noisy and non-stationary time-series input data, while the generated high-quality convolved features are exploited by a Kolmogorov–Arnold network to provide an estimation of the future values. In addition, the employment of the DILATE loss function ensures the effective handling of sharp changes in target values and high volatility, which characterize non-stationary time-series. Note that between the convolutional layers and Kolmogorov–Arnold network, a pooling layer may be used, which aims to produce a refined version of the convolved features that the convolutional layer produced. In other words, the scope of the pooling operation is to assist the model to be more robust, since small changes in the input will not change the pooled output values [25].

For the calculation of the predictions

{\hat{y}}_{t + 1}, {\hat{y}}_{t + 2}, \dots, {\hat{y}}_{t + h}

at time-step t, the C-KAN model takes as its input k historical time-steps

y_{t}, y_{t - 1}, \dots, y_{t - k + 1}

, which are initially processed by a series of convolutional layers and a pooling layer, transforming them into a latent representation

x

. Mathematically, this operation can be represented by

x = p (W_{c} ⊙ [y_{t}, y_{t - 1}, \dots, y_{t - k + 1}] + b_{c})

where p is the pooling function, ⊙ denotes the convolution operation,

W_{c}

represents the convolution filters and

b_{c}

is the bias term. After the application of these layers, the resulting feature vector

x

captures the temporal dependencies in the input time series.

Then, vector

x

is used as an input in a L-layer Kolmogorov–Arnold network for calculating the predictions, that is,

{\hat{y}}_{t + 1}, {\hat{y}}_{t + 2}, \dots, {\hat{y}}_{t + h} = (Φ_{L - 1} \circ Φ_{L - 2} \circ \dots \circ Φ_{1} \circ Φ_{0}) x

where

Φ_{l}

is the function matrix corresponding to the l-th layer, with

l = 0, 1, \dots, L - 1

, defined by

[\begin{matrix} ϕ_{l, 1, 1} & ϕ_{l, 1, 2} & \dots & ϕ_{l, 1, n_{l}} \\ ϕ_{l, 2, 1} & ϕ_{l, 2, 2} & \dots & ϕ_{l, 2, n_{l}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ϕ_{l, n_{l + 1}, 1} & ϕ_{l, n_{l + 1}, 2} & \dots & ϕ_{l, n_{l + 1}, n_{l}} \end{matrix}]

where

ϕ_{l, i, j}

is the activation function connecting the i-th node of the l-layer with the j-th node of the

(l + 1)

-layer. Moreover, the univariate function to be learned

ϕ_{l, i, j}

is parameterized as a B-spline, namely

ϕ_{l, i, j} (x) = w (b (x) + \sum_{m = 1}^{M} c_{m} B_{m}^{(l, i, j)} (x))

where

w \in R

is a parameter which controls the magnitude of

ϕ_{l, i, j} (x)

,

b (x) = x / (1 + e^{- x})

is a basis function,

c_{m}

is a trainable parameter and

B_{m}^{(l, i, j)} (x)

is a B-spline.

At this point, it is worth mentioning that a Kolmogorov–Arnold network simply constitutes a combination of B-splines and an MLP-based architecture, and avoids the drawbacks and exploits the strengths of both. In detail, the former is accurate for approximating low-dimensional functions, while the latter is excellent in modeling compositional structures due to its feature learning ability. Therefore, it is able to learn the complex internal structure of time-series data (due to the MLP-based architecture) and optimize the learned features (due to the utilization of B-splines).

In summary, the major innovations in the proposed model’s architecture are the use of (i) convolutional layers, which are dedicated to learning the behavior and internal representation of time-series input data; (ii) activation at the edges the Kolmogorov–Arnold network, which is able to potentially alter learning dynamics and enhance interpretability; and (iii) modular non-linearity by applying non-linearity before summing the inputs of the Kolmogorov–Arnold network, to allow the differentiated treatment of features and potentially more precise control over inputs’ influence on outputs. These innovations in the forecasting model architecture aim at effectively handling complex time-series data and ultimately lead to the development of robust forecasting models.

3.2. DILATE Loss Function

For maximizing the performance of the proposed C-KAN model, especially for forecasting non-stationary time-series, DILATE loss is employed [8]. The proposed differentiable loss focuses on predicting sudden changes by explicitly disentangling into two distinct components the penalization related to the change’s detection shape and the temporal localization errors, that is,

L (y, \hat{y}) = w_{1} L_{s h a p e} (y, \hat{y}) + w_{2} L_{t e m p o r a l} (y, \hat{y})

(2)

where

w_{1}, w_{2} \in [0, 1]

with

w_{1} + w_{2} = 1

controls the weight between the components

L_{s h a p e}

and

L_{t e m p o r a l}

, which focus on supporting shape detection and temporal trends, respectively.

The shape-component

L_{s h a p e}

is based on a smoothed version of Dynamic Time Warping [26], namely

L_{s h a p e} (y, \hat{y}) = - γ log (\sum_{A \in A_{h \times h}} e^{- \frac{〈 A | Δ (y, \hat{y}) 〉}{γ}})

where

γ > 0

is a parameter,

〈 \cdot | \cdot 〉

is the dot product,

A_{h \times h}

is the set of all valid paths connecting the endpoints

(1, 1)

to

(h, h)

(with authorized moves

\to, ↓, ↘

),

Δ (y, \hat{y}) = [δ (y_{i}^{(h)}, {\hat{y}}_{i}^{(j)})]

is the pairwise cost matrix and

δ

is the dissimilarity between

y^{h}

and

{\hat{y}}^{j}

.

The temporal component

L_{t e m p o r a l}

is based on the computation of the Time Distortion Index [27] for temporal misalignment estimation and penalizes any temporal irregularities between the ground truth and predicted values, that is,

L_{t e m p o r a l} (y, \hat{y}) = \frac{1}{Z} \sum_{A \in A_{h, h}} 〈 A | Ω 〉 e^{- \frac{〈 A | Δ (y, \hat{y}) 〉}{γ}}

where

Z = \sum_{A \in A_{h, h}} e^{- \frac{〈 A | Δ (y, \hat{y}) 〉}{γ}}

and

Ω

is an

h \times h

square matrix, penalizing each element y being associated with a

\hat{y}

with its diagonal elements equal to zero. In our experiments, similar to [8], each non-diagonal element of

Ω

is set to

ω (i, j) = h^{- 2} {(i - j)}^{2}

.

Note that the main disadvantage of training a deep learning model with traditional MSE-based losses is that the model is not able to capture sharp changes in target values. In contrast, DILATE loss is able to accurately balance shape and temporal alignment as well as providing flexibility in balancing different aspects of the prediction task, therefore leading to the development of a robust forecasting model.

4. Experimental Analysis

In this section, the performance of the proposed C-KAN model (the implementation code can be found in https://github.com/ioannislivieris/C-KAN, accessed on 15 August 2024) is compared against that of the most effective deep learning models for non-stationary time-series. The primary aim is to provide convincing evidence about the efficiency and effectiveness of the proposed model. The experimental results are based on five well-known and challenging non-stationary time-series datasets from different application domains.

Bitcoin dataset. This dataset concerns daily Bitcoin values from 1 January 2020 to 31 December 2023 in USD, with Bitcoin holding the largest market capitalization among cryptocurrencies. The data were obtained from https://finance.yahoo.com and were divided into training/validation/testing sets based on the scheme 70/10/20, while all models were trained to predict the future 4 values given the past 12 values, as in [11].
Gold dataset. This dataset concerns daily Gold values from 1 January 2000 to 31 December 2023 in USD, which were also obtained from https://finance.yahoo.com. The data were divided into training/validation/testing sets based on the scheme 70/10/20, while all models were trained to predict the future 7 values given the past 21 values, as in [25].
Synthetic dataset. This dataset is used for evaluating time-series models’ ability to predict sudden changes based on an input signal composed of two peaks. Each series in the dataset is composed of 40 time steps, with the first 20 used as inputs while the last 20 are used as targets to forecast. In each series, the input range is composed of two peaks of random temporal positions $i_{1}$ and $i_{2}$ and random amplitudes $a_{1} \in [0, 1]$ and $a_{2} \in [0, 1]$ . The target range is composed of a step of amplitude $a_{2} - a_{1}$ and stochastic position $i_{2} + (i_{2} - i_{1}) + r a n d i n t (- 3, 3)$ . In addition, all series were corrupted by additive Gaussian white noise (with variance = 0.01). The training dataset contains 450 series for fitting the models, the validation dataset contains 50 series and the testing set contains 500 series, as in [28].
ECG5000 dataset. This dataset is composed of 5000 electrocardiograms (ECG) of length 140 in which 450 are used for training, 50 are used for validation and the remaining 4500 are used for testing [8]. In our experiments, the first 60% of the time steps of each electrocardiogram (84 time steps) were used as inputs to the models, while the last 40% of each series (56 time steps) were used for testing, as in [8].
Solar dataset. This dataset contains the solar power production records of 2006, from 137 PV plants in Alabama state, which measure samples every 10 min. In our experiments, the data of a randomly selected PV plant were divided into training/validation/testing sets based on the scheme 70/10/20, while all models were trained to predict the future 12 values given the past 24 values, as in [29].

It is worth mentioning that all non-stationary time-series data used in this study contained no missing values, while the outlier values were not removed so as to not destroy the dynamics of each series. In addition, no stationary time-series datasets were included, since they are generally considered considerably easier to analyze [6].

Next, we evaluate the performance of the following:

“Seq2Seq”, which stands for the Seq2Seq model proposed by Le et al. [8].
“N-BEATS”, which stands for N-BEATS model proposed by Wu et al. [13].
“Conv-LSTM-Att”, which stands for the Convolutional Long Short-Term Memory Attention forecasting model proposed by Livieris and Pintelas [11].
“DeepTime”, which stands for the DeepTime forecasting model proposed by Woo et al. [9].
“DLinear”, which stands for the DLinear model proposed by Zeng et al. [12].
“C-KAN”, which stands for the proposed forecasting model.

To conduct a fair comparison, since the time-series data are non-stationary, DeepTime, DLinear and N-BEATS are trained using the DILATE loss function (2) with learning rates of

l r = 10^{- 4}

,

α = 0.5

and

γ = 0.1

. In contrast, Conv-LSTM-Att is trained using the mean-square-loss function, since the data are transformed to stationary using the methodology proposed in [11]. All state-of-the-art models were trained with Adaptive Moment Estimation (ADAM) [30], while the rest of the hyperparameters were set to default, since any change resulted in similar performance or degradation. In contrast, any changes in the hyper-parameters of DILATE loss considerably affected and degraded the performance of all models. This highlights the importance of the utilization of this advanced loss function for dealing with complex and non-stationary time-series, which are characterized by high volatility and fluctuations. C-KAN was trained using the DILATE loss function (2) with

α = 0.9

and

γ = 0.1

, employing L-BFGS [31] as a training algorithm. C-KAN consists of one convolutional layer of four filters of size (2,) for the Bitcoin and Gold datasets, and eight filters of size (2,) for the Synthetic, ECG5000 and Solar datasets, followed by a GELU activation function [32] and a max pooling layer of size (2,). Then, a Kolmogorov–Arnold network is employed with the topology [6,3], [8,3], [5,5], [40,20,40], [24] for the Bitcoin, Gold, Synthetic, ECG5000 and Solar datasets, respectively. Note that more information about the selected C-KAN architecture is presented in Section 4.2.

The performance of all forecasting models is evaluated using the metrics Root Mean Square Error (RMSE), R-squared (R²), the Hausdoff score (HD) and RAMP scores (RAMP), which are, respectively, defined by

\begin{matrix} RMSE & = & \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}, & R^{2} & = & 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}, \\ HD & = & max (max_{{\hat{y}}_{i} \in \hat{y}} min_{y_{i} \in y} | {\hat{y}}_{i} - y_{i} |, max_{y_{i} \in y} min_{{\hat{y}}_{i} \in \hat{y}} | y_{i} - {\hat{y}}_{i} |) & RAMP & = & \frac{100}{N} \sum_{i = 1}^{N} \frac{2 | {\hat{y}}_{i} - y_{i} |}{| {\hat{y}}_{i} | + | y_{i} |} \end{matrix}

where N is the number of forecasts,

y_{i}

and

{\hat{y}}_{i}

are the actual and predicted values, respectively, and

\bar{y}

is the mean of the actual values. Note that RMSE and R² focus on the prediction errors and the correlation between the predicted and the actual values, respectively, while HD and RAMP focus on the model’s ability to track temporal patterns and capture the amplitude of changes, respectively.

4.1. Numerical Experiments

Table 1 presents the performance of all forecasting models on the Bitcoin, Gold, Synthetic, ECG5000 and Solar datasets. Note that the best performance for each metric is highlighted in bold, while DeepTime was not applied to Synthetic, ECG5000 and Solar, since no time indices were available for these datasets. Clearly, the proposed C-KAN forecasting model presents the best overall performance, outperforming the traditional state-of-the-art models. As regards the RMSE and R² metrics, C-KAN exhibits the best score for all datasets, followed by the Seq2Seq model. This implies that C-KAN not only achieves the best RMSE values, indicating high accuracy in predictions, but also secures the highest R² values, reflecting a strong correlation between the predicted and actual values across all datasets. In addition, C-KAN reports the second best score for the HD metric and the top score relative to the RAMP metric in three out of five datasets. The comprehensive evaluation across five challenging datasets establishes C-KAN as the most effective forecasting model among the evaluated ones relative to all performance metrics, followed by the Seq2Seq and DLinear models. It is also worth mentioning that the performance of the C-KAN model is notably better on relatively small datasets (i.e., Bitcoin and Gold), while it remains competitive on larger datasets like ECG5000, Synthetic and Solar.

To provide strong statistical evidence of the effectiveness of the proposed forecasting model, a statistical analysis is conducted for evaluating the hypothesis

H_{0}

that the forecasting models exhibit equally accurate predictions. For this purpose, the non-parametric Friedman Aligned-Ranks (FAR) test [33] and the Li post hoc test [34] with a significance level of

α = 5 %

are employed. The former is employed for ranking the forecasting models based on a performance metric, while the latter is employed for examining the existence of considerable differences and the presence of significant variations in their predictions, without any assumption about the distribution of the performance scores [35,36].

Table 2 presents the statistical analysis based on the RMSE, HD, RAMP and R² performance metrics. The FAR test suggests that the proposed C-KAN model reports the highest probability ranking, followed by the Seq2Seq model, relative to all performance metrics. In addition, the Li post hoc test demonstrates that there exist significant differences between the performance of C-KAN and the rest of the forecasting models as regards the RMSE, RAMP and R² metrics, while for the HD metric, the post hoc test suggests that C-KAN, Seq2Seq and DLinear exhibit similar performance.

In summary, both the experimental results and the statistical analysis provide evidence of the effectiveness and robustness of the proposed C-KAN forecasting model, especially for short-term predictions.

4.2. Ablation Study

Next, an ablation study is conducted for evaluating and measuring the performance of the proposed C-KAN model using various hidden layer sizes and to study the effect of convolutional layers and DILATE loss together with a Kolmogorov–Arnold network.

Table 3 presents the performance of the proposed C-KAN forecasting model for different architecture designs (number and size of hidden layers) on the Bitcoin, Gold, Synthetic, ECG5000 and Solar datasets; the best performance for each metric and dataset is pointed out in bold. Note that the in our original experiments, we utilized several settings (number of hidden layers and their sizes) using a grid-search strategy, and the most representative were selected for providing deeper insight into C-KAN’s sensitivity. Clearly, the interpretation of Table 3 suggests that the C-KAN model’s performance constantly improves as the size of the Kolmogorov–Arnold network increases. Note that in our experiments, any attempt to further increase the number of nodes in the hidden layer(s) resulted in no performance improvement.

For the Bitcoin dataset, the best performance in terms of RMSE is observed with the hidden layer configuration [6,3], achieving an RMSE of 1130.37, while the same configuration also shows the highest R² score of 0.96. On the other hand, for the Gold dataset, the configuration [8,3] exhibits the best RMSE and R² scores of 32.0 and 0.89, respectively, indicating that a larger but shallow network can effectively capture the underlying patterns in this dataset. In the Synthetic dataset, the configuration [10,10] achieves the best score relative to all performance metrics, closely followed by the configuration [5,5], suggesting that the use of a network with two hidden layers is essential for this benchmark. For the ECG5000 dataset, the best performance is obtained with the [40,20,40] configuration, which achieves an RMSE of 0.42 and an R² of 0.82, highlighting the benefits of a deeper and more complex network for more intricate time-series data. Finally, for the Solar dataset, the utilization of one hidden layer of 10 neurons presents the best results. It is worth mentioning that an increase of the number of neurons to 20 and 32 results in similar performance, while the addition of a second hidden layer demonstrates accuracy degradation.

In summary, the interpretation of Table 3 indicates that an increase in the number and size of hidden layers generally enhances the C-KAN model’s performance, up to a certain limit. Any further increase in complexity does not necessarily lead to a performance improvement, which emphasizes the importance of careful architectural tuning tailored to the specific characteristics of the dataset in question. This ablation study highlights the flexibility and adaptability of the C-KAN model, capable of being fine-tuned to achieve superior forecasting accuracy across diverse time-series datasets.

Next, the performance of the proposed model against a traditional Kolmogorov-Arnold network (KAN) model for time-series forecasting [20] is examined, i.e., the input time-series data are directly fed to the network without first been processed by any convolutional layer. Table 4 summarizes the performance evaluation of the KAN and C-KAN models across all datasets. Note that the performance of both the KAN and C-KAN models is evaluated using both DILATE and MSE losses. Similar to the previous experiments, the best performance for each metric and dataset is pointed out in bold. In general, C-KAN demonstrates better performance compared to KAN relative to all performance metrics. Interestingly, while C-KAN outperforms KAN in terms of RMSE and R², the differences in the HD and RAMP metrics are less pronounced. This suggests that although the convolutional layers contribute to better accuracy in predicting the overall trend of the time-series, they may not significantly impact the model’s ability to capture other aspects of the data, such as temporal dynamics and amplitude changes. In contrast, the DILATE loss function enhances the performance of both models, considerably improving their ability of capturing sharp changes in target values, which characterize the selected non-stationary and complex datasets. Finally, it is worth highlighting that both KAN and C-KAN present similar performance for both losses, presenting slightly better performance in terms of sudden change prediction in case training with DILATE loss.

Overall, the presented results indicate that the integration of convolutional layers to a Kolmogorov–Arnold network enhances its forecasting capabilities in terms of reducing prediction errors, ultimately leading to more accurate and reliable predictions; meanwhile, the utilization of the DILATE loss function demonstrates that the C-KAN model is able to deal with high volatility and fluctuations present in complex and non-stationary time series.

4.3. Qualitative Examples

For completeness, some qualitative examples of Seq2Seq, KAN and C-KAN are provided for the Bitcoin, Gold, Synthetic, ECG5000 and Solar datasets (Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6). Note that Seq2Seq was selected since it presented the best performance among all evaluated state-of-the-art forecasting models. Each figure shows the models’ inputs (past values), target values (future values) and forecasts (predictions). The vertical axis denotes the value of the time series, while the horizontal axis denotes the time steps. C-KAN presents the best performance in terms of prediction accuracy, since it is able to accurately predict any shape and temporal change detection in the target values. In addition, the interpretation of Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6 reveals that although the predictions of all models have the correct shape and a precise temporal localization, the predictions of C-KAN are much closer to the target values, which implies that the proposed model is able to better leverage the knowledge contained in the training data and make more accurate forecasts.

5. Conclusions and Future Research

In this work, a new time-series model for multi-step forecasting was proposed, named C-KAN, which is based on leveraging convolutional layers with Kolmogorov–Arnold networks. The proposed model’s architecture has three main characteristics: (i) convolutional layers for learning the behavior and internal representation of time-series input data, (ii) activation at the edges of the Kolmogorov–Arnold network for altering training dynamics and (iii) modular non-linearity for allowing the differentiated treatment of features and potentially more precise control over inputs’ influence on outputs. A notable advantage of this new type of model architecture is its ability to handle complex time-series data and ultimately lead to the development of effective and robust time-series forecasting models. Furthermore, the proposed model was trained using the DILATE function [8], which ensures that it is able to effectively deal with non-stationary time-series.

The performance of the C-KAN model was compared against state-of-the-art forecasting models on five challenging non-stationary time-series datasets. The reported numerical results revealed that the proposed model was able to outperform all traditional forecasting models relative to all the performance metrics. The statistical analysis ensured the robustness and effectiveness of the C-KAN model. Specifically, the FAR test ranked C-KAN highest across all performance metrics, while the Li post hoc test demonstrated significant differences between the performance of C-KAN and the performance of the traditional models. Consequently, the experimental analysis provides empirical and statistical evidence, which illustrates that C-KAN constitutes an efficient and accurate model, well suited for time-series forecasting tasks.

Our ablation study highlighted the impact of different configurations on the model’s performance. In detail, it was noticed that an increase in the number and size of hidden layers generally enhanced the model’s forecasting ability up to a certain limit. Clearly, a limitation of the proposed model is that its performance heavily depends on the configuration of the Kolmogorov–Arnold network, highlighting the importance of tailored architectural tuning. Therefore, it is worth highlighting that a hyper-parameter tuning step based on sophisticated strategies such as Bayesian optimization [37] or Hyperband and Successive Halving [38] should be applied for maximizing the model’s performance and effectiveness; hence, a comparison study using various hyperparameter settings and tuning strategies will be considered in our future research tasks.Furthermore, the effect of the combination of convolutional layers with a Kolmogorov–Arnold network was also studied, which revealed that convolutional layers are able to provide more valuable information than the historical input values of the Kolmogorov–Arnold network, hence leading to more precise forecasts. It is also worth highlighting that the convolutional layers enhanced the forecasting capabilities of the Kolmogorov–Arnold network in terms of reducing prediction errors and having less impact in terms of capturing temporal dynamics and amplitude changes. Finally, the ablation study demonstrated that the utilization of the DILATE loss function has a considerable effect on the performance of the proposed model, improving its ability to capture sharp changes in target values, which characterize the selected non-stationary and challenging datasets.

The main limitation of the proposed approach is that the process of training the C-KAN model may be slow, since instead of a single scalar weight value for each input, we now might be tuning multiple parameters (used in the learnable univariate B-spline function), which is an overhead. In addition, the C-KAN training process requires heavy computational resources, which implies that its application for large datasets and the process of tailored architectural tuning might be challenging as well. By also taking into consideration the performance of the C-KAN model on datasets of various sizes and different forecasting horizons, we are able to conclude that this model is probably preferable for relatively small datasets and for short-term forecasts. However, more experiments are needed in order to experimentally clarify this statement. Another limitation that could be considered is the fact that in the current research, we focused our attention on challenging univariate non-stationary time-series, which are characterized by noise, high volatility and significant fluctuations; hence, a performance evaluation study on other datasets possessing other characteristics such as seasonality and/or trend [39,40] is certainly included in the future plans.

Our future work is concentrated on the employment of the decomposition scheme utilized in transformer models [13,14] to the C-KAN model, as well as the utilization of other loss functions dedicated to dealing with non-stationary time-series [41]. Another interesting idea is an evaluation on multivariate datasets and with dedicated models for these datasets, including transformer models that efficiently capture temporal dependencies such as Crossformer [40] or focus on attributes of time-series data like seasonality, locality and global temporal dependencies such as Dozerformer [42], as well as a a comparison with hybrid models [43,44]. Furthermore, a challenging direction for future research is to enhance the proposed model with attention mechanisms, which are able to increase the forecasting model’s focus on useful information in the input data for generating accurate forecasts [6].

Funding

This research received no external funding.

Data Availability Statement

The data will be made available by the authors on request.

Conflicts of Interest

The author declares no conflict of interest.

References

González-Pérez, B.; Núñez, C.; Sánchez, J.L.; Valverde, G.; Velasco, J.M. Expert system to model and forecast time series of epidemiological counts with applications to COVID-19. Mathematics 2021, 9, 1485. [Google Scholar] [CrossRef]
Lazcano, A.; Herrera, P.J.; Monge, M. A combined model based on recurrent neural networks and graph convolutional networks for financial time series forecasting. Mathematics 2023, 11, 224. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, R.; Liu, J.; Liu, X.; Petrosian, O.; Krinkin, K. Comparison and explanation of forecasting algorithms for energy time series. Mathematics 2021, 9, 2794. [Google Scholar] [CrossRef]
Garai, S.; Paul, R.K.; Rakshit, D.; Yeasin, M.; Emam, W.; Tashkandy, Y.; Chesneau, C. Wavelets in combination with stochastic and machine learning models to predict agricultural prices. Mathematics 2023, 11, 2896. [Google Scholar] [CrossRef]
Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary transformers: Exploring the stationarity in time series forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 9881–9893. [Google Scholar]
Livieris, I.E. A novel forecasting strategy for improving the performance of deep learning models. Expert Syst. Appl. 2023, 230, 120632. [Google Scholar] [CrossRef]
Shumway, R.H.; Stoffer, D.S.; Shumway, R.H.; Stoffer, D.S. ARIMA models. In Time Series Analysis and Its Applications; Springer: Cham, Switzerland, 2017; pp. 75–163. [Google Scholar]
Le Guen, V.; Thome, N. Shape and time distortion loss for training deep time series forecasting models. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Woo, G.; Liu, C.; Sahoo, D.; Kumar, A.; Hoi, S. Learning deep time-index models for time series forecasting. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 37217–37237. [Google Scholar]
Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Livieris, I.E.; Pintelas, P. A novel multi-step forecasting strategy for enhancing deep learning models’ performance. Neural Comput. Appl. 2022, 34, 19453–19470. [Google Scholar] [CrossRef]
Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? Proc. Aaai Conf. Artif. Intell. 2023, 37, 11121–11128. [Google Scholar] [CrossRef]
Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
Alfred, R.; Obit, J.H.; Ahmad Hijazi, M.H.; Ag Ibrahim, A.A. A performance comparison of statistical and machine learning techniques in learning time series data. Adv. Sci. Lett. 2015, 21, 3037–3041. [Google Scholar]
Cerqueira, V.; Torgo, L.; Soares, C. A case study comparing machine learning with statistical methods for time series forecasting: Size matters. J. Intell. Inf. Syst. 2022, 59, 415–433. [Google Scholar] [CrossRef]
Schmid, L.; Roidl, M.; Pauly, M. Comparing statistical and machine learning methods for time series forecasting in data-driven logistics—A simulation study. arXiv 2023, arXiv:2303.07139. [Google Scholar]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov–Arnold Networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
Schmidt-Hieber, J. The Kolmogorov–Arnold representation theorem revisited. Neural Netw. 2021, 137, 119–126. [Google Scholar] [CrossRef]
Xu, K.; Chen, L.; Wang, S. Kolmogorov-Arnold Networks for Time Series: Bridging Predictive Power and Interpretability. arXiv 2024, arXiv:2406.02496. [Google Scholar]
Motavali, A.; Yow, K.C.; Hansmeier, N.; Chao, T.C. DSA-BEATS: Dual Self-Attention N-BEATS Model for Forecasting COVID-19 Hospitalization. IEEE Access 2023, 11, 137352–137365. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Liu, S.; Yu, H.; Liao, C.; Li, J.; Lin, W.; Liu, A.X.; Dustdar, S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
Zhou, N.; Zeng, H.; Zhou, J. DLinear-Based Prediction of the RUL of PEMFC. In Proceedings of the 2024 4th International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 12–14 January 2024; pp. 221–224. [Google Scholar]
Livieris, I.E.; Pintelas, E.; Pintelas, P. A CNN–LSTM model for gold price time-series forecasting. Neural Comput. Appl. 2020, 32, 17351–17360. [Google Scholar] [CrossRef]
Zhao, J.; Itti, L. shapedtw: Shape dynamic time warping. Pattern Recognit. 2018, 74, 171–184. [Google Scholar] [CrossRef]
Frías-Paredes, L.; Mallor, F.; León, T.; Gastón-Romeo, M. Introducing the Temporal Distortion Index to perform a bidimensional analysis of renewable energy forecast. Energy 2016, 94, 180–194. [Google Scholar] [CrossRef]
Cuturi, M.; Blondel, M. Soft-dtw: A differentiable loss function for time-series. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 894–903. [Google Scholar]
Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long-and short-term temporal patterns with deep neural networks. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Gill, P.E.; Murray, W.; Wright, M.H. Numerical Linear Algebra and Optimization; SIAM: Philadelphia, PA, USA, 2021. [Google Scholar]
Lee, M. Mathematical analysis and performance evaluation of the gelu activation function in deep learning. J. Math. 2023, 2023, 4229924. [Google Scholar] [CrossRef]
Hodges, J.L., Jr.; Lehmann, E.L. Rank methods for combination of independent experiments in analysis of variance. In Selected Works of EL Lehmann; Springer: Berlin/Heidelberg, Germany, 2011; pp. 403–418. [Google Scholar]
Li, J.D. A two-step rejection procedure for testing multiple hypotheses. J. Stat. Plan. Inference 2008, 138, 1521–1527. [Google Scholar]
Kiriakidou, N.; Livieris, I.E.; Pintelas, P. Mutual information-based neighbor selection method for causal effect estimation. Neural Comput. Appl. 2024, 36, 9141–9155. [Google Scholar] [CrossRef]
Kiriakidou, N.; Livieris, I.E.; Diou, C. C-XGBoost: A tree boosting model for causal effect estimation. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations; Springer: Berlin/Heidelberg, Germany, 2024; pp. 58–70. [Google Scholar]
Victoria, A.H.; Maragatham, G. Automatic tuning of hyperparameters using Bayesian optimization. Evol. Syst. 2021, 12, 217–223. [Google Scholar] [CrossRef]
Li, L.; Jamieson, K.G.; DeSalvo, G.; Rostamizadeh, A.; Talwalkar, A. Hyperband: Bandit-based configuration evaluation for hyperparameter optimization. In Proceedings of the ICLR (Poster), Toulon, France, 24–26 April 2017; p. 53. [Google Scholar]
Nguyen, L.; Novák, V. Forecasting seasonal time series based on fuzzy techniques. Fuzzy Sets Syst. 2019, 361, 114–129. [Google Scholar] [CrossRef]
Zhang, Y.; Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Wang, X.; Zhang, H.; Zhang, Y.; Wang, M.; Song, J.; Lai, T.; Khushi, M. Learning nonstationary time-series with dynamic pattern extractions. IEEE Trans. Artif. Intell. 2021, 3, 778–787. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, R.; Dascalu, S.M.; Harris, F.C., Jr. Sparse transformer with local and seasonal adaptation for multivariate time series forecasting. Sci. Rep. 2024, 14, 15909. [Google Scholar] [CrossRef]
Hajirahimi, Z.; Khashei, M. Hybrid structures in time series modeling and forecasting: A review. Eng. Appl. Artif. Intell. 2019, 86, 83–106. [Google Scholar] [CrossRef]
Mohammadi, B.; Mehdizadeh, S.; Ahmadi, F.; Lien, N.T.T.; Linh, N.T.T.; Pham, Q.B. Developing hybrid time series and artificial intelligence models for estimating air temperatures. Stoch. Environ. Res. Risk Assess. 2021, 35, 1189–1204. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed C-KAN model for time-series forecasting.

Figure 2. Qualitative forecasting results of Seq2Seq, KAN and C-KAN for Bitcoin dataset. Each figure shows the models’ inputs (past values), target values (future values) and forecasts.

Figure 3. Qualitative forecasting results of Seq2Seq, KAN and C-KAN for Gold dataset. Each figure shows the models’ inputs (past values), target values (future values) and forecasts.

Figure 4. Qualitative forecasting results of Seq2Seq, KAN and C-KAN for Synthetic dataset. Each figure shows the models’ inputs (past values), target values (future values) and forecasts.

Figure 5. Qualitative forecasting results of Seq2Seq, KAN and C-KAN for ECG5000 dataset. Each figure shows the models’ inputs (past values), target values (future values) and forecasts.

Figure 6. Qualitative forecasting results of Seq2Seq, KAN and C-KAN for Solar dataset. Each figure shows the models’ inputs (past values), target values (future values) and forecasts.

Table 1. Performance evaluation of all forecasting models on Bitcoin, Gold, Synthetic, ECG5000 and Solar datasets.

Model	Bitcoin				Gold
Model	RMSE	R²	HD	RAMP	RMSE	R²	HD	RAMP
Seq2Seq	1465.83	0.93	1064.62	0.03	42.78	0.80	53.59	0.02
N-BEATS	3566.24	0.62	4664.05	0.09	49.35	0.74	39.6	0.02
Conv-LSTM-Att	2062.79	0.87	1851.0	0.04	49.6	0.74	27.57	0.02
DeepTIMe	1904.4	0.89	1263.62	0.04	48.73	0.75	51.2	0.02
DLinear	1739.83	0.9	2346.11	0.04	44.81	0.79	18.33	0.02
C-KAN	1246.08	0.96	1277.95	0.02	32.0	0.89	24.62	0.01
Model	Synthetic				ECG5000
Model	RMSE	R²	HD	RAMP	RMSE	R²	HD	RAMP
Seq2Seq	0.14	0.85	0.10	14.99	0.44	0.80	1.28	6.77
N-BEATS	0.14	0.85	0.07	18.17	0.42	0.81	0.72	3.83
Conv-LSTM-Att	0.12	0.89	0.13	17.11	0.50	0.74	2.15	4.24
DeepTIMe	-				-
DLinear	0.25	0.55	0.81	6.34	0.58	0.66	0.24	6.38
C-KAN	0.11	0.90	0.10	12.35	0.42	0.82	2.22	3.8
Model	Solar
Model	RMSE		R²		HD		RAMP
Seq2Seq	5.34		0.84		3.51		2.9
N-BEATS	5.15		0.86		2.40		3.0
Conv-LSTM-Att	6.32		0.78		9.67		3.8
DeepTIMe			-
DLinear	5.77		0.85		2.38		3.8
C-KAN	5.06		0.86		3.70		3.6

Table 2. Statistical analysis: FAR and Li post hoc tests results.

Model	FAR	Li Post Hoc Test
Model	FAR	p-Value	H₀
C-KAN	6.6	-	-
Seq2Seq	8.9	0.02149	Rejected
DLinear	13.8	0.0	Rejected
N-BEATS	16.1	0.0	Rejected
Conv-LSTM-Att	19.6	0.0	Rejected
Metric: RMSE
C-KAN	5.0	-	-
Seq2Seq	11.7	0.0	Rejected
Conv-LSTM-Att	14.9	0.0	Rejected
N-BEATS	16.1	0.0	Rejected
DLinear	17.3	0.0	Rejected
Metric: R²
C-KAN	11.9	-
Seq2Seq	12.5	0.58888	Failed to reject
DLinear	12.5	0.55825	Failed to reject
Conv-LSTM-Att	12.6	0.61707	Failed to reject
N-BEATS	15.6	0.0	Rejected
Metric: HD
C-KAN	8.5	-	-
DLinear	13.4	0.0	Rejected
N-BEATS	13.4	0.0	Rejected
Seq2Seq	14.7	0.0	Rejected
Conv-LSTM-Att	15.0	0.0	Rejected
Metric: RAMP

Table 3. Performance evaluation of C-KAN using various architectures on Bitcoin, Gold, Synthetic, ECG5000 and Solar datasets.

Dataset	Hidden Layer	RMSE	R²	HD	RAMP
Bitcoin	[6]	1267.06	0.95	1382.12	0.03
	[24]	1198.33	0.95	1281.56	0.02
	[6,3]	1130.37	0.96	1392.63	0.02
	[8,3]	1246.08	0.96	1277.95	0.02
Gold	[6]	40.55	0.83	22.10	0.02
	[24]	36.57	0.86	36.62	0.02
	[6,3]	36.65	0.86	30.32	0.02
	[8,3]	32.00	0.89	24.62	0.01
Synthetic	[10]	0.15	0.85	0.25	16.75
	[30]	0.13	0.88	0.18	12.48
	[5,5]	0.12	0.90	0.12	13.97
	[10,10]	0.11	0.90	0.10	12.35
ECG5000	[50]	0.54	0.78	2.31	3.88
	[60]	0.58	0.79	2.54	4.37
	[40,20]	0.52	0.80	2.60	3.26
	[40,20,40]	0.42	0.82	2.22	3.80
Solar	[10]	5.06	0.86	5.70	3.7
	[20]	5.08	0.86	12.12	4.6
	[32]	5.06	0.86	12.79	4.6
	[12,6]	5.19	0.84	21.79	5.0

Table 4. Performance evaluation of KAN and C-KAN models using DILATE and MSE losses on Bitcoin, Gold, Synthetic, ECG5000 and Solar datasets.

Dataset	Model	RMSE	R²	HD	RAMP
Bitcoin	KAN (MSE)	1451.36	0.94	1430.13	0.03
	KAN (DILATE)	1439.50	0.94	709.06	0.03
	C-KAN (MSE)	1372.22	0.94	1351.21	0.03
	C-KAN (DILATE)	1246.08	0.96	1277.95	0.02
Gold	KAN (MSE)	44.96	0.78	33.36	0.02
	KAN (DILATE)	42.21	0.81	27.33	0.02
	C-KAN (MSE)	41.19	0.81	28.19	0.02
	C-KAN (DILATE)	32.0	0.89	24.62	0.01
Synthetic	KAN (MSE)	0.15	0.83	0.39	19.14
	KAN (DILATE)	0.12	0.89	0.10	13.49
	C-KAN (MSE)	0.13	0.87	0.19	14.1
	C-KAN (DILATE)	0.11	0.90	0.10	12.35
ECG5000	KAN (MSE)	0.42	0.82	2.45	3.81
	KAN (DILATE)	0.42	0.82	2.31	3.81
	C-KAN (MSE)	0.42	0.82	2.41	3.82
	C-KAN (DILATE)	0.42	0.82	2.22	3.8
Solar	KAN (MSE)	5.51	0.84	4.41	4.4
	KAN (DILATE)	5.40	0.85	5.7	3.7
	C-KAN (MSE)	5.11	0.86	3.41	4.5
	C-KAN (DILATE)	5.06	0.86	3.70	3.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Livieris, I.E. C-KAN: A New Approach for Integrating Convolutional Layers with Kolmogorov–Arnold Networks for Time-Series Forecasting. Mathematics 2024, 12, 3022. https://doi.org/10.3390/math12193022

AMA Style

Livieris IE. C-KAN: A New Approach for Integrating Convolutional Layers with Kolmogorov–Arnold Networks for Time-Series Forecasting. Mathematics. 2024; 12(19):3022. https://doi.org/10.3390/math12193022

Chicago/Turabian Style

Livieris, Ioannis E. 2024. "C-KAN: A New Approach for Integrating Convolutional Layers with Kolmogorov–Arnold Networks for Time-Series Forecasting" Mathematics 12, no. 19: 3022. https://doi.org/10.3390/math12193022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

C-KAN: A New Approach for Integrating Convolutional Layers with Kolmogorov–Arnold Networks for Time-Series Forecasting

Abstract

1. Introduction

2. Related Work

3. C-KAN Time-Series Forecasting Model

3.1. Model’s Architecture

3.2. DILATE Loss Function

4. Experimental Analysis

4.1. Numerical Experiments

4.2. Ablation Study

4.3. Qualitative Examples

5. Conclusions and Future Research

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI