**3. Model Approach**

Figure 4 shows the basic model of the proposal. Instead of using the hydrogen flow as the model output, the variation in the current flow is predicted. Moreover, as the model is focused in the electrical power produced by the fuel cell, the inputs are the current power, the desired power in the future, and the current *H*2 inlet flow. The solution provided in this research lies on the modeling of the necessary fuel flow (hydrogen) for a desired power, to minimized the *H*2 outlet of the fuel cell.

**Figure 4.** General schema of the functional model.

Figure 5 shows the specific signals and their temporal instants. With the current values of power and hydrogen flow, the model predicts the flow variation two states later to achieve the desired power. As the fuel cell system reacts before the load demands the future power, this model increases the efficiency of the fuel cell.

**Figure 5.** Model approach for forecasting actual current value.

In order to obtain this prediction, a hybrid model has been created using clustering techniques to divide the data into various data subsets. After that, several regression algorithms were trained for each cluster. Figure 6 shows an internal representation of the hybrid model, it can be seen that each group has its own regression model. Each input sample is assigned to a specific cluster, and the output of the whole model will be the output of the specific local model.

**Figure 6.** Internal schematic to achieve the hybrid model.

Figure 7 shows the flow diagram followed to create the hybrid model. To perform the third step, the best local model selection, K-Fold cross validation is used to divide the data subsets (cluster data) for training and testing.

#### *Processes* **2019**, *7*, 825

**Figure 7.** Flowchart of the hybrid model creation phases.

Figure 8 shows this validation procedure. Once K-Fold is selected, *one k-th* of the cluster data is used for testing and the rest for training. With this training data, a regression model is created with the algorithm selected, and the testing data is used to calculate the modeled output. The real testing data output and the predicted one is save in an *Error log*. The training–testing procedure is repeated *k* times until all the data is used as testing data. At this time, the *Error log* has all the cluster data to calculate the error for each regression algorithm.

**Figure 8.** K-Fold training and test data selection.

## *3.1. Data Processing*

To prepare the dataset for the regression phase, a preprocessing of the data is carried out. This process is divided into two different steps. Firstly, the wrong samples are removed—the samples with out of range values. The second step is the normalization, which tries to minimize the training time in the next regression phase. This normalization is based on Max-Min Scaler [34], presented in Equation (2), which obtains new sample values (*Dataj*) in a range from 0 to 1.

$$Data\_{j\_{new}} = \frac{Data\_j - \min(Data)}{\max(Data) - \min(Data)}.\tag{2}$$
