**2. Materials and Methods**

The boom of the different virtualization technologies [4] makes them ideal for the construction of a system of this style. They allow the developed system to be independent of the machine on which it is executed, which provides great versatility and flexibility. In addition, these technologies allow an exclusive use of the resources, allowing them to contain only the necessary modules. One of the most powerful and versatile technologies in this field is *Docker* [5], which allows an easy definition of systems with their characteristics to be taken into account. This, in addition to efficiency when carrying out the tasks, would

a Server for the Implementation of Data Processing Pipelines and ANN Training. *Eng. Proc.* **2021**, *7*, 38. https://doi.org/10.3390/ engproc2021007038

**Citation:** Galdo, B.; Rivero, D.; Fernandez-Blanco, E. Development of


Academic Editors: Joaquim de Moura, Marco A. González, Javier Pereira and Manuel G. Penedo

Published: 18 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

provide greater security since the machine will only contain the services necessary to perform the task entrusted to it.

The architecture developed must also allow the management of different users and databases. This is not such a costly task, so it will be included in the same module to optimize the architecture resources.

#### **3. Results**

Thanks to this architecture, load balancing [6] of the different training and data processing processes can be carried out exclusively. This implies that the nodes will be activated on demand and will have all the resources dedicated to the work they want to perform without taking into account other functionalities such as user authentication or the management of the different files that would consume a series of resources unnecessarily. Likewise, the architecture of the system would be as shown in the Figure 1.

It is necessary to mention that the current development is based on the ANN technique, which allows the implementation of deep learning models [7].

This architecture can be divided into a front-end part based on an MVC pattern [8] and a back-end part composed of three large modules. These modules are divided according to their expected workload. Firstly, there is a Data Processing module [1], whose objective is to perform the operations indicated by the user on the data. Secondly, a model training module [2] has been detected, which is in charge of generating the models indicated by the user and performing the training with the required database. Finally, a Facade module [9] is needed, in charge of acting as a facade and performing the less expensive operations such as user management and management of the different files on the server.

**Figure 1.** Data processing and model training server architecture.

A possible implementation of this architecture can be found on the GitHub repository https://github.com/braiscgaldo/NIR-Lab-2.0 (accessed on 3 July 2021).

#### **4. Discussion**

A scheme has been defined for a server capable of performing the data processing and model training tasks in a distributed and on-demand manner. This offers a number of advantages over other systems such as Weka. The latter performs these tasks in a single instance, which causes the resources of the machine in which it is executed to be depleted due to the fact that it must manage all the functionalities present in the system.

This approach offers the possibility of running on cloud services such as AWS [10], Azure [11], or Google Cloud [12]. This architecture enables the replication of nodes as needed for the execution of data processing or model training in a unique way, which offers a great advantage over desktop applications whose only source of computational power is the computer itself.

#### **5. Future Work**

This project presents numerous avenues for future work. One of these possible developments is motivated by the extension of the type of machine learning models. It would be straightforward to extend the set of models composed only of ANN to other algorithms such as SVM, KNN, RF, or LDA.

It is also necessary to highlight the possibility of interactive data processing, visualizing at each point how the different variables defined by the user behave.

**Author Contributions:** Conceptualization, B.G.; methodology, B.G.; software, B.G.; validation, B.G.; formal 64 analysis, B.G.; investigation, B.G.; resources, B.G., D.R., E.F.-B.; data curation, B.G.; writing original draft preparation, 65 B.G., D.R., E.F.-B.; writing—review and editing, B.G., D.R., E.F.-B.; visualization, B.G.; supervision, B.G., D.R., E.F.-B.; project 66 administration, B.G.; funding acquisition, B.G., D.R., E.F.-B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was partially funded by the Galician goverment and EFRD funds (ED431G/01).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** An implementation could be found in https://github.com/braiscgaldo/ NIR-Lab-2.0 (accessed on 3 July 2021).

**Acknowledgments:** The authors would like to thank the support from RNASA-IMEDIR group.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


