*2.3. Modeling*

Once data are obtained, fused, preprocessed and curated, the modeling phase implies the extraction of knowledge by constructing a model to characterize the distribution of such data or their evolution in time. The distillation of such knowledge can be performed for different purposes: to represent unsupervised data in a more valuable manner (as in e.g., clustering or manifold learning), for instance, to insight patterns relating the input data to a set of supervised outputs (correspondingly, classification/regression) aiming to automatically label unseen data observations, to predict future values based on the previous values (time series forecasting), or to inspect the output produced by a model when processing input data (simulation). To do so, in data-based modeling machine learning algorithms are often put to use, which allow automating the modeling process itself.

The above purposes can serve as a discrimination criterion for different algorithmic approaches for data-based modeling. However, when the goal is to model data interactions within complex systems such as transportation networks, it is often the case that the modeling choice resorts to ensembles of different learner types. For instance, when applying regression models for road traffic forecasting, a first clustering stage is often advisable to unveil typicalities in the historical traffic profiles and to feed them as priors for the subsequent predictive modeling [35–37]. However, when it comes to model actionability, a key feature of this stage is the *generalization* of the developed model to unseen data. This characteristic implies making a model useful beyond the data on which it is trained, which implies that the model design efforts should not only be put on making the model achieve a marginally superior performance, but also to be useful in other spatial or temporal circumstances. Achieving good generalization properties for the developed can be tackled by diverse means, which often depend on the modeling purpose at hand (e.g., cross-validation, regularization, or the use of ensembles in predictive modeling). Essentially, the design goal is to find the trade-off between performance (through representing much of the intrinsic variance of data) and generalization (staying away from an overfitted model to a particular training set). This aspect becomes utterly relevant when data modeling is done on time-varying data produced by dynamic phenomena. ITS are, in point of fact, complex scenarios subject to strong sources of non-stationarity, thereby calling for an utmost focus on this aspect.

The complexity met in traffic and transportation operations is usually treated with heterogeneous modeling approaches that aim to complement each other to improve accuracy [38–40]. This can be done either by comparing different models and selecting the most appropriate one every time, or by combining different models to produce the final outcome. Additionally, in some fields of ITS, such as traffic modeling, physical (namely, theory- or simulation-based) models have been available for decades. Their integration into data-based modeling workflows, considering the knowledge they can provide, can become crucial for a manifold of purposes, e.g., to enforce traffic theory awareness in models learned from ITS data. Indeed, the hybridization of physical and data-based models has a ye<sup>t</sup> to be developed potential that has only been timidly explored in some recent works [41–43].

Interestingly, complex data driven modeling solutions to transportation phenomena have been numerous and resourceful ranging from modular structures, to model combinations, surrogate modeling [44] and so on. Regardless of the approach, literature emphasizes on the critical issue of model hyperparameter optimization using for example nature inspired algorithms, namely Evolutionary Computation or Swarm Intelligence [39,45]. Assuming that there is a feasible and acceptable solution to the problem of selecting the proposed parameters for a data drive model, when dealing with complex modeling structure this task should be conducted automatically by optimizing the hyperparameter space usually based on the models' predictive error. It is to note that, the greater the number of

models involved the more difficult the optimization task becomes. Moreover, relying on nature inspired stochastic approaches, full determinism in the solution and convergence stability can not be formally guaranteed [5].
