**4. Ensembles**

We previously mentioned that a method to counter overfitting is to train multiple models and then combine their outputs. Ensemble learning combines the single-model outputs to improve predictions and generalization. Ensemble learning improves upon three key aspects of learning, statistics, computation and representation [**?** ]. From a statistics perspective, ensemble methods reduce the risk of data miss-representation, by combining multiple models we reduce the risk of employing a single model trained with biased data. While most learning algorithms search locally for solutions which in turn confines the optimal solution, ensemble methods can execute random seed searches with variable start points with less computational resources. A single hypothesis rarely represents the target function, but an aggregation of multiple hypothesis, as found in ensembles, can better approximate the target function.

We present two ensemble architectures, stacked and weighted [**?** ]. Other popular ensemble methods include AdaBoost, Random Forest and Bagging [**?** ]. Stacked ensembles are the simplest ye<sup>t</sup> one of the most effective ensemble methods, widely used in a variety of applications [**? ?** ]. Stacked ensemble acts as our baseline ensemble, compared with our proposed weighted ensemble based on differential evolution, a meta-heuristic weight optimization method. Meta-heuristic weighted ensembles have achieved remarkable results in single label text classification [**? ?** ].
