**4. Conclusions**

The accurate forecasting of any asset has obvious practical implications. It can help individuals on both the supply and demand sides to reduce associated risk by better anticipating future changes in prices and by being prepared and acting on time to optimize their participation and behavior in the relevant market via positive or negative storage, substitution from and to this market, and the alteration of budget plans and in general decreases uncertainty, which has adverse effects on both suppliers and consumers. Moreover, governmen<sup>t</sup> officials can use such information for larger scale planning, as they can anticipate prices swings.

The effective forecasting of natural gas prices is obviously important for all market participants: suppliers, distributors, consumers, investors, and regulatory agencies. It is also a powerful and important tool that has become increasingly important for various stakeholders in the natural gas market, helping them to make better decisions for risk management, reducing the demand-supply gap, and optimizing resource utilization based on accurate predictions. For investors trading in the U.S. energy equity markets, the current boom around green energy investing offers significant hurdles. If and how these investors learn to cope with various information frictions, navigate through broad fluctuations in market risk appetite and uncertainty, and deal with unexpected changes in energy laws and regulations will be crucial to their investment decisions [18].

In this paper, we tested the effectiveness of various machine learning algorithms in forecasting natural gas spot prices. We trained multiple machine learning models and a naïve random walk model. In machine learning models, we used the optimal number of lagged natural gas spot prices and 21 other explanatory variables (regressors). These were selected based on economic theory and the relevant literature. Hence, these regressors included macroeconomic and stock market indicators, exchange rates, interest rates, the spot prices and future contracts of Oklahoma West Texas Intermediate Crude Oil, the corresponding future contracts of natural gas, the momentum of the last 5 and 10 days, and the 5- and 10-day moving average. The models were trained to forecast horizons one, three, five and ten days ahead (*t* + 1, *t* + 3, *t* + 5 and *t* + 10).

The dataset included 2423 daily observations for the time period from 3 December 2010 to 18 September 2020. This dataset was divided into two subsets, with the first part covering the range from 19 November 2010 to 19 September 2019, or 2180 observations that were used to train our models, and the second part spanning the period from 20 September 2019 to 18 September 2020, or the remaining 243 observations, which were used to test the generalization ability of the models to unknown data that were not used in the training process. In order to avoid the issue of overfitting, we employed a 5-fold cross validation method.

The optimal AR representation was found to be 14 lags using a linear SVM model. Next, we added all of the explanatory variables to train the 19 models. In 10 of these models, we detected overfitting; thus, they were not used in the subsequent analysis. These models were the interactions linear, SVM (quadratic, cubic, fine Gaussian, medium Gaussian, coarse Gaussian) and GPR (squared exponential, matern 5/2, exponential, and rational quadratic) models. The models that did not show overfitting were the random walk, linear regression, robust linear, fine tree, medium tree, coarse tree, linear SVM, boosted trees, and bagged trees models.

According to the results, the optimal model for in-sample data at *t* + 1 is a linear regression model, and for *t* + 3, *t* + 5, and *t* + 10 bagged trees models are optimal. For the out-of-sample data, the best models are linear SVM models for *t* + 1, *t* + 5, and *t* + 10 and a boosted trees model for *t* + 3. The aforementioned models do not overfit since the RMSE's for the in-sample and out-of-sample data are comparable.

Therefore, from our research, we conclude that the most effective methods for natural gas spot price forecasting are the linear SVM and the bagged trees.

**Author Contributions:** D.M., E.S., P.G., and T.P. contributed equally to all parts of the paper, except for funding acquisition (E.S.). All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme "Human Resources Development, Education and Lifelong Learning" in the context of the project "Strengthening Human Resources Research Potential via Doctorate Research" (MIS-5000432), implemented by the State Scholarships Foundation (IK Υ).

**Data Availability Statement:** The data that support the findings of this study are available from the corresponding author upon reasonable request.

**Conflicts of Interest:** The authors declare no conflict of interest.
