**1. Introduction**

Sentiment analysis is the process by which we uncover sentiment from information. The sentiment part could refer to polarity [**?** ], fine grained or not [**?** ], or to pure emotion information [**???** ]. The most common source of information for sentiment analysis is Online Social Networks (OSNs) [**? ?** ]. User-generated content provides a unique combination of complexity and challenge for automated sentiment classification.

Automated classification refers to methods that can identify and classify information based on an inference process. Machine Learning (ML) studies these types of methods and can be generally separated in three parts: Modeling, Learning, and Classification. Given a classification task, a ML method has to create a model of the data, learn based on a set of pre-classified examples and perform a classification, as required by the task. Ensemble learning refers to the combination of finite number of ML systems to improve the classification results [**?** ].

Various ML systems exist, most frequently characterized by the model and the training methods they employ. Artificial Neural Networks (ANNs) are one type of ML systems [**?** ]. These networks have three layers: input, hidden, and output. In the input layer, data is initially fed into a model. The model parameters are then (re)calculated in the hidden layer, and data is classified in the output layer. Each layer consists of a set of nodes or artificial neurons which are connected to the next layer. When an ANN consists of multiple hidden layers, it is referred to as a Deep Neural Network (DNN) [**?** ].

DNNs have been widely used in computer vision problems [**???** ], where the goal of the classification is to identify or detect objects/items/features in an image. When the goal of the classification is to detect multiple objects in an image then the task is considered multilabel. These types of problems can be extended from computer vision to text analysis. In emotion related classification, a textual input can convey one or multiple emotions.

Traditional sentiment analysis is focused on a confined polarity or single emotion basis. Our main goal is to present the effectiveness of ensemble learning in text-based multilabel classification. In addition, we aim to trigger the researcher interest for considering multilabel emotion classification as a significant aspect regarding sentiment analysis.

Our contributions are as follows. We create and present five multilabel classification architectures and two ensembles, as well as a baseline stacked ensemble and a weighted ensemble that assigns weights based on differential evolution. Then, we highlight the effectiveness of ensemble learning in modern multilabel emotion datasets. Our results show that ensemble learning can be more effective than single DNN networks in multilabel emotion classification. In addition, we also incorporate a high-level description of the most commonly used hidden layers to introduce readers to deep-learning architectures.

The remainder of our work is formatted as described. Section **??** covers some introductory bibliography alongside state-of-the-art ensemble publications. Section **??** presents in detail our diverse DNN architectures and their individual components. Section **??** describes the ensemble methods we employed as well as some key sub-components. Section **??** presents the datasets we used and some of their properties. Section **??** details our results and potential improvements. Section **??** concludes our study with the summary and future work direction.
