**1. Introduction**

The use of data processing techniques [1] and machine learning [2] is based on trying to detect patterns in a set of data in order to provide an estimate on the data. This technology is experiencing a great boom due to the optimization of the different algorithms and the notable increase in the computational capacity of the different systems.

Both database processing and the training of machine learning models are very complex and computationally expensive tasks. When this is added to the processing of very large databases or the development of complex models, it is common to have specific hardware to speed up these tasks. Otherwise, this task would take a long time to be performed on a conventional computer, even breaking some of its components due to the stress caused by computational volume.

In addition, defining different processing pipelines or different models can be very complex for people who are not experts in the field. To alleviate these deficiencies, there are tools that allow this task to be carried out visually. This would be the case of Weka [3], which allows performing these tasks in a simple way. However, this application does not allow its execution on different machines.

With these points in mind, namely ease of use and scalability, the architecture of a distributed system for database processing and training of machine learning models is proposed. In this way, the resources of the machine on which the different processes are executed will be specifically dedicated to this task.
