*1.1. Personality Assessment*

Personality assessment is a well defined process that can help unveil how a person may react to different, unexpected, situations [5]. Several personality tests are available such as HEXACO-60 [6], Myers-Briggs Type Indicator [7], the Enneagram of Personality [8] and NEO-personality-inventory (NEO-PI-R) [9]. Goldberg's 100 Unipolar Markers' Test [10] is ye<sup>t</sup> another test that consists of a total of 100 adjectives, or markers, that the subject must rate on how they relate to each adjective, with 1 being *Extremely Inaccurate* and 9 *Extremely Accurate*. Among the full set of markers one may find adjectives such as *talkative*, *sympathetic*, *careless*, *envious* or *deep*. Goldberg's test allows one to measure five domains, in particular, *Surgency*, *Agreeableness*, *Conscientiousness*, *Emotional Stability* and *Intellect*. Different domains have also been proposed. The OCEAN model, on the other hand, consists of the following five factors [11,12]:


Goldberg's test consist of 100 unipolar markers that must be quantified by a subject. An important reduction to the set of markers was performed by Gerard Saucier with The Mini-Marker test, using a sub-set of 40 markers to assess the Big Five with an acceptable performance, leading to the use of less difficult markers and lower inter-scale correlations [13]. Saucier's test uses the same rating scale, being made of five disjoint-sets of eight unipolar markers each:


#### *1.2. Machine Learning for Personality Assessment*

During these last years, Machine Learning (ML) has been raising to prominence. In fact, the use of ML models to predict personality traits has gain significant popularity within the field of Affective Computing, with several studies having already engaged on conceiving ML models for personality assessment [12,14–16]. In 2017, Majumder et al. conceived and evaluated Deep Learning (DL) models to assess personality from text. They conceived and fit a total of five artificial neural networks (ANN), one for each of the Big Five personality traits. All networks had the same architecture, with each ANN behaving as a binary classifier to predict whether the trait was positive or negative [14]. As dataset the authors used James Pennebaker and Laura King's stream-of-consciousness essay dataset, which contains 2468 anonymous essays tagged with the binary value for each of the Big Five [17]. This dataset seems, however, to be currently unavailable. In fact, several datasets containing anonymized psychological assessments seem to have been locked, or closed, such as the one provided by the myPersonality platform (myPersonality.org), a platform that made available a dataset containing textual social media data and from where several studies emerged, being essentially focused on modelling personality traits based on language-based information [18,19].

In a slightly different domain, in 2017, Yu and Markov conceived and evaluated several DL models to learn suitable data representation for personality assessment, using facebook status update data. This dataset consisted of raw text, user's information and standard Big Five labels, which were obtained using self-assessment questionnaires [15]. In fact, it is possible to find several studies focused on inferring personality based on social media feeds. For instance, Kosinski et al. (2014) focused on examining how an individual's personality manifests in his/her online behaviour, in particular, the website he/she visits and his/her Facebook activity. The expectation is that web activity combined with social media data may bring unbiased insights, since social media feeds may carry an intention of self-enhancement and positivity [12]. The used dataset was obtained from myPersonality. The obtained results showed psychologically meaningful links between individuals' personalities, website preferences and social media data. The potential applications of these works are essentially related with targeted advertising and personalised recommender systems, which take into consideration one's personality to deliver useful content.

In 2012, Sumner et al., based on Twitter use, focused on identifying signals of the Dark Triad, i.e., the anti-social traits of *Narcissism*, *Machiavellianism* and *Psychopathy*. Almost three thousand Twitter users, from 89 countries, participated in the study, with an in-built Twitter application being developed to collect self-reported ratings on the Short Dark Triad questionnaire, which measures the anti-social traits, and the Ten Item Personality Inventory (TIPI) test, which measures the Big Five. The authors conclude that even though possible to examine large groups of people, the conceived ML models behave poorly when applied to individuals, being imprecise when predicting Dark Triad traits just from Twitter activity [16].

Another study, performed by Cerasa et al. (2018), focused on conceiving and evaluating ML models to identify individuals with gambling disorder. To build the dataset, a set of healthy and sick individuals were asked to perform the NEO-PI-R test, an operationalization of the five factor model. The authors employed Classification and Regression Trees (CART) achieving interesting performances evaluated using the area under the curve (AUC). In fact, the best candidate model was able to identify individuals with gambling disorder with an AUC of approximately 77% [20].

On the other hand, studies have been performed where audio and video data are used by DL-based models to predict personality [21]. One study, performed by Levitan et al. (2016), focused on the automatic identification of traits such as gender, deception and personality using acoustic-prosodic and lexical features [22]. In particular, the authors focused on automatic detection of deception. The authors used Columbia deception corpus, which consists of deceptive and non-deceptive speech from standard American-English and Mandarin-Chinese native speakers, including more than one hundred hours of speech with self-identified truth/lie labels [23]. The authors then collected demographic data from each subject and administered a NEO-FFI personality test to access the Big Five. Each trait was binned as a three-class classification problem (*low*, *medium* and *high*), which created an highly unbalanced dataset since the majority of subjects fell into the *medium* class. Hence, to compare models' performances the authors used f-scores to obtain a meaningful comparison. Several ML models and feature sets were experimented, with AdaBoost and Random Forests being the best performing classifiers for personality assessment [22].

Another study, performed by Gurpinar et al. (2016), focused in using DL to predict the Big Five of faces appearing in videos [24]. The authors employed transfer-learning and Convolutional

Neural Networks to extract facial expressions, as well as ambient information. The conceived models were evaluated on the *ChaLearn Challenge Dataset on First Impression Recognition*, which consists of ten thousand clips collected from more that five thousand YouTube videos. The label of each clip corresponds to the Big Five personality traits of the person appearing in that clip. Their best candidate model achieved an accuracy of over 90% on the test set [24].

It is also usual to find the use of different data sources combined through means of data fusion for personality assessment. Indeed, personality assessment from multi-modal data has been assuming a greater importance in the computer vision field [25]. For instance, Gucluturk et al. (2017), aimed to analyse what features are used by personality trait assessment models when making predictions, conducting several experiments that characterised audio and visual information that drive such predictions [25]. On the other hand, Zhang et al. (2016) proposed a Deep Bimodal Regression framework to capture rich information from both the visual and audio aspects of videos, winning the *ChaLearn Looking at People* challenge. Convolutional Neural Networks were conceived to exploit visual cues, while linear regressors where used for audio [26].

#### *1.3. Hypothesis and Paper Structure*

Many studies have already engaged on using ML or DL for personality assessment using images, videos, audio or text. However, to the best of our knowledge, we are the first to apply ML to reduce the complexity of a test. In fact, the working hypothesis is that it is possible to use ML-based modes to further reduce Saucier's Mini-Marker to a "game of words" where the subject, instead of rating forty adjectives, only has to select those he relates the most, removing the need to rate adjectives. The proposed Adjective Selection to Assess Personality (ASAP) method replaces the entire process of rating adjectives by an adjective selection process. The goal is to reduce the complexity of tests, the time it takes to perform a test, and to make the test more attractive and easier to implement in current and future technological platforms. Hence, this study aims to conceive, tune and evaluate two distinct Gradient Boosting ML architectures to quantify an individual's personality based on his/her choice of adjectives. Due to the non-availability of data, a web platform was developed and place online, being responsible for the entire data collection process. To conduct experiments on non-data scarce environments, data augmentation techniques were designed and implemented to produce a second dataset, which was also evaluated.

The remainder of this paper is structured as follows, viz. Section 2 describes the material and methods, in particular the developed platform for data collection, data exploration, the implemented data augmentation techniques as well as the conceived ML architectures, the experimental setup and the conducted experiments. Section 3 summarises the obtained results, providing a concise description of the experimental results and their interpretation. Section 4 presents and discusses the results and their interpretation in the perspective of previous studies and of the working hypothesis, depicting the main conclusions and pointing future research directions.

#### **2. Materials and Methods**

Due to the non-availability of data and the particularities of the proposed ASAP method, we were required to develop a web platform for data collection, requesting subjects to rate adjectives and select those describing them the most. This allowed us to build a dataset containing self-reported ratings on Saucier's Mini-Marker test, the corresponding values of the Big Five as well as the adjectives selected by the subjects. The next lines describe in detail the developed platform, exploring and explaining the collected dataset and the implemented data augmentation techniques. It also details the conceived ML architectures and the experimental setup.
