**2. Problem Statement of Machine Learning**

**Definition 1.** *A set of computational procedures, that transforms a vector* **x** *from an input space* X *to a vector* **y** *from an output space* Y*, and there is not any algebraic equation* **y** = **f**(**x**) *for them, is called an* **unknown function.**

For example, the system of ordinary differential equations **x**˙ = **f**(**x**) is an unknown function for a vector of initial conditions **x**(0) and a vector of solutions as functions of time **x**(*t*, **x**(0)), if a general solution is unknown for this differential equation.

The unknown function between input vector **x** and output vector **y** is defined as

$$\mathbf{y} = \boldsymbol{\alpha}(\mathbf{x}).\tag{1}$$

Then for differential equations without general solutions, an unknown function has a form

$$
\mathbf{x}(t, \mathbf{x}^0) = \boldsymbol{\mathfrak{a}}(\mathbf{x}^0). \tag{2}
$$

**Definition 2.** *A work area is a subset of input vector space, where the input vectors exist surely and that is used for solving the problem.*

The unknown function can be realized by a physical equipment or an experiment. Then unknown function will be called black box, but will be described as (1).

Let a set of input vectors be determined in the work area

$$\tilde{\mathbb{X}} = \{ \mathbf{x}^1, \dots, \mathbf{x}^N \} \subseteq \mathbb{X}. \tag{3}$$

For every input vector, output vector is determined by the unknown function (1)

$$\bar{\mathbf{Y}} = \{ \mathbf{y}^1 = \mathfrak{a}(\mathbf{x}^1), \dots, \mathbf{y}^N = \mathfrak{a}(\mathbf{x}^N) \} \subseteq \mathbf{Y}. \tag{4}$$

**Definition 3.** *A pair of sets,*

$$(\vec{\chi}, \vec{\chi})\_{\prime} \tag{5}$$

*is called a training set.*

It is known that there are supervised and unsupervised machine learning methods. An **unsupervised machine learning problem** can be formulated as follows: for some unknown function (1) and a positive small value *δ* it is necessary to find a function

$$\mathbf{y} = \beta(\mathbf{x}, \mathbf{q}),\tag{6}$$

where **q** is a vector of parameters, **q** = [*q*<sup>1</sup> ... *qp*] *<sup>T</sup>*, such that <sup>∀</sup>**<sup>x</sup>** <sup>∈</sup> <sup>X</sup>

$$\|\mathbf{y} - \beta(\mathbf{x}, \mathbf{q})\| \le \delta. \tag{7}$$

A **supervised machine learning problem** consequently can be formulated as follows: for some unknown function (1) and a positive small value *δ*, it is necessary to determine a positive value *ε*, to build a training set (5) and to find a function (6) such that if the total error for the training sample is less than the given value *ε*

$$\sum\_{i=1}^{N} ||\mathbf{y}^{i} - \beta(\mathbf{x}^{i}, \mathbf{q})|| \le \varepsilon,\tag{8}$$

then for <sup>∀</sup>**x**<sup>∗</sup> from work area, but not included in the training set **<sup>x</sup>**<sup>∗</sup> <sup>∈</sup> X and **<sup>x</sup>**<sup>∗</sup> <sup>∈</sup>/ X˜ the following inequation is performed

$$\|\mathbf{y}^\* - \beta(\mathbf{x}^\*, \mathbf{q})\| \le \delta,\tag{9}$$

where **y**∗ = *α*(**x**∗).

Here the function *β*(**x**, **q**) includes a parameter vector **q**. In many approaches a structure of function is defined beforehand on the basis of experience or intuitively, and it is necessary to find only values of some parameters. For example, an artificial neural network [3–5], which is often used for solving of the machine learning problems, has a set structure and large number of unknown parameters. In contrast, symbolic regression methods [6–8] allow you to search for both function structure and parameters.
