Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data
Abstract
:1. Introduction
- Extraversion: The first trait is extraversion, which is frequently associated with being outgoing, talkative, energetic, enthusiastic, active, and assertive. Extraversion involves positive emotions and a sociable tendency. In terms of job performance, the extraversion dimension is a good indicator for managers and salespeople [20].
- Agreeableness: The second trait is agreeableness, which is most often associated with being kind, sympathetic, forgiving, generous, and appreciative. Agreeableness involves trusting and cooperative tendencies.
- Conscientiousness: Conscientiousness is the third trait, which is associated with being responsible, playful, reliable, efficient, and organized. Conscientiousness involves scrupulous and diligent tendencies. In terms of job performance, the conscientiousness dimension is the best indicator of performance in every job type [20].
- Neuroticism: The fourth trait is neuroticism, or being unstable, worried, tense, touchy, anxious, and self-pitying. Neuroticism involves danger sensitivity and psychological distress tendencies. Neuroticism shows more anxiety than other traits [21]. This type shows low emotional stability and lower stress tolerance, and has a tendency to experience negative emotions [22].
- Openness to experience: The fifth trait is openness to experience, or being original, widely interested, imaginative, insightful, artistic, and curious. It involves a willingness to think about other options and alternatives, and a tendency to curiosity [23]. In terms of job performance, the openness to experience dimension is a good indicator of training proficiency [20].
2. Methodology
3. Results
4. Discussion
- Facebook’s pre-defined features include personal information, work information, contact information, education, time spent on Facebook, frequency of use, number of statues, number of friends, number of groups, number of likes, number of photos, and number of tags.
- Twitter’s pre-defined features include the number of followings, followers, retweets, hashtags, and links.
- In other social media, pre-defined features include personal information and time spent on Instagram, Sina Weibo, or LinkedIn.
- In Linguistic Inquiry and Word Count (LIWC) features, words are tagged in different sections including linguistic (e.g., adjective, pronoun, or noun), relativity (e.g., past, time, or future), personal process (e.g., family, home, or job) and psychological process (e.g., emotions).
- In Natural Language Toolkit (NLTK) features, words are tagged into 89 categories to mainly extract grammatical features.
- Word frequency-based features include frequency of depression phrases, frequency of the first and/or second and/or third-person pronouns, 1000 most frequent words, and frequency of positive or negative words.
- Character frequency-based features include frequency of question marks, punctuation marks, exclamation marks, negative and positive emoticons, or capitalized letters.
- Term frequency-inverse document frequency (TFIDF) features based on the number of times a word appears in the document and the number of documents in the corpus that contain the word [109].
- Latent Dirichlet Allocation (LDA) features based on assigning topics to documents and generates topic distributions over words given a collection of texts [110].
- Linguistic features are based on pre-processing (removing stop words, stemming, and word segmentation tools) and semantic analysis.
4.1. Identifying Human Behavior
4.2. Predicting Human Behavior
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Fournier, M.A.; Moskowitz, D.S.; Zuroff, D.C. Integrating dispositions, signatures, and the interpersonal domain. J. Pers. Soc. Psychol. 2008, 94, 531–545. [Google Scholar] [CrossRef] [PubMed]
- Maddock, J.; Starbird, K.; Al-Hassani, H.J.; Sandoval, D.E.; Orand, M.; Mason, R.M. Characterizing online rumoring behavior using multi-dimensional signatures. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, Vancouver, BC, Canada, 14–18 March 2015; pp. 228–241. [Google Scholar]
- Makeig, S.; Gramann, K.; Jung, T.-P.; Sejnowski, T.J.; Poizner, H. Linking brain, mind and behavior. Int. J. Psychophysiol. 2009, 73, 95–100. [Google Scholar] [CrossRef] [PubMed]
- Shen, Z.; Su, J. Web service discovery based on behavior signatures. In Proceedings of the 2005 IEEE International Conference on Services Computing (SCC’05) Vol-1, Orlando, FL, USA, 11–15 July 2005; Volume 1, pp. 279–286. [Google Scholar]
- Shoda, Y. Behavioral expressions of a personality system. Coherence Personal. Soc. Cogn. Bases Consistency Var. Organ. 1999, 29, 155–181. [Google Scholar]
- Shoda, Y.; Wilson, N.L.; Whitsett, D.D.; Lee-Dussud, J.; Zayas, V. The person as a cognitive-affective processing system: From quantitative idiography to cumulative science. Handb. Personal. Process. Individ. Differ. 2014, 4, 491–513. [Google Scholar]
- Sticha, P.J.; Weaver, E.A.; Tatman, J.A.; Mahoney, S.M.; Buede, D.M. Reading the Behavior Signature: Predicting Leader Personality from Individual and Group Actions. In Proceedings of the AAAI Spring Symposium: Technosocial Predictive Analytics, Stanford, CA, USA, 23–25 March 2009; pp. 130–136. [Google Scholar]
- Farnadi, G.; Sitaraman, G.; Sushmita, S.; Celli, F.; Kosinski, M.; Stillwell, D.; Davalos, S.; Moens, M.-F.; De Cock, M. Computational personality recognition in social media. User Model. User-Adapt. Interact. 2016, 26, 109–142. [Google Scholar] [CrossRef] [Green Version]
- Fatima, I.; Mukhtar, H.; Ahmad, H.F.; Rajpoot, K. Analysis of user-generated content from online social communities to characterise and predict depression degree. J. Inf. Sci. 2018, 44, 683–695. [Google Scholar] [CrossRef]
- Azucar, D.; Marengo, D.; Settanni, M. Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis. Personal. Individ. Differ. 2018, 124, 150–159. [Google Scholar] [CrossRef]
- Fiok, K.; Karwowski, W.; Gutierrez, E.; Reza-Davahli, M. Comparing the Quality and Speed of Sentence Classification with Modern Language Models. Appl. Sci. 2020, 10, 3386. [Google Scholar] [CrossRef]
- Thelwall, M.; Buckley, K.; Paltoglou, G.; Cai, D.; Kappas, A. Sentiment strength detection in short informal text. J. Am. Soc. Inf. Sci. Technol. 2010, 61, 2544–2558. [Google Scholar] [CrossRef] [Green Version]
- Boumi, S.; Vela, A.; Chini, J. Quantifying the relationship between student enrollment patterns and student performance. arXiv 2020, arXiv:2003.10874. [Google Scholar]
- Markovikj, D.; Gievska, S.; Kosinski, M.; Stillwell, D. Mining Facebook Data for Predictive Personality Modeling. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, Cambridge, MA, USA, 8–11 June 2013; pp. 23–26. [Google Scholar]
- Gou, L.; Zhou, M.X.; Yang, H. KnowMe and ShareMe: Understanding automatically discovered personality traits from social media and user sharing preferences. In Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems—CHI ’14, Toronto, ON, Canada, 26 April–1 May 2014; ACM Press: Toronto, ON, Canada, 2014; pp. 955–964. [Google Scholar]
- Gupta, U.; Chatterjee, N. Personality Traits Identification Using Rough Sets Based Machine Learning. In Proceedings of the 2013 International Symposium on Computational and Business Intelligence, New Delhi, India, 24–26 August 2013; pp. 182–185. [Google Scholar]
- Vinciarelli, A.; Mohammadi, G. A Survey of Personality Computing. IEEE Trans. Affect. Comput. 2014, 5, 273–291. [Google Scholar] [CrossRef] [Green Version]
- Staiano, J.; Lepri, B.; Aharony, N.; Pianesi, F.; Sebe, N.; Pentland, A. Friends don’t lie: Inferring personality traits from social network structure. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, Pittsburgh, PA, USA, 5–8 September 2012; pp. 321–330. [Google Scholar] [CrossRef] [Green Version]
- Ortigosa, A.; Carro, R.M.; Quiroga, J.I. Predicting user personality by mining social interactions in Facebook. J. Comput. Syst. Sci. 2014, 80, 57–71. [Google Scholar] [CrossRef]
- Barrick, M.R.; Mount, M.K. The Big Five Personality Dimensions and Job Performance: A Meta-Analysis. Pers. Psychol. 1991, 44, 1–26. [Google Scholar] [CrossRef]
- Golbeck, J.; Robles, C.; Turner, K. Predicting personality with social media. In Proceedings of the 2011 Annual Conference Extended Abstracts on Human Factors in Computing Systems—CHI EA ’11, Vancouver, BC, Canada, 7–12 May 2011; ACM Press: Vancouver, BC, Canada, 2011; pp. 253–262. [Google Scholar]
- Wald, R.; Khoshgoftaar, T.; Sumner, C. Machine prediction of personality from Facebook profiles. In Proceedings of the 2012 IEEE 13th International Conference on Information Reuse Integration (IRI), Las Vegas, NE, USA, 8–10 August 2012; pp. 109–115. [Google Scholar]
- Amichai-Hamburger, Y.; Vinitzky, G. Social network use and personality. Comput. Hum. Behav. 2010, 26, 1289–1295. [Google Scholar] [CrossRef]
- Liu, Y.; Liu, T.; Wang, Y.J. Research on micro-blog character analysis based on Naïve Bayes. In Proceedings of the Seventh International Conference on Digital Image Processing (ICDIP 2015); International Society for Optics and Photonics, Los Angeles, CA, USA, 9–10 April 2015; Volume 9631, pp. 96312F1–96312F5. [Google Scholar]
- Sumner, C.; Byers, A.; Boochever, R.; Park, G.J. Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets. In Proceedings of the 2012 11th International Conference on Machine Learning and Applications, Boca Raton, FL, USA, 12–15 December 2012; Volume 2, pp. 386–393. [Google Scholar]
- Liberati, A.; Altman, D.G.; Tetzlaff, J.; Mulrow, C.; Gøtzsche, P.C.; Ioannidis, J.P.A.; Clarke, M.; Devereaux, P.J.; Kleijnen, J.; Moher, D. The PRISMA Statement for Reporting Systematic Reviews and Meta-Analyses of Studies That Evaluate Health Care Interventions: Explanation and Elaboration. PLOS Med. 2009, 6, e1000100. [Google Scholar] [CrossRef]
- Davahli, M.R.; Karwowski, W.; Taiar, R. A System Dynamics Simulation Applied to Healthcare: A Systematic Review. Int. J. Environ. Res. Public. Health 2020, 17, 5741. [Google Scholar] [CrossRef]
- National Heart, Lung, and Blood Institute (NHLBI) Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies. Available at: National Heart, Lung, and Blood Institute, Bethesda, MD, USA. Available online: https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools (accessed on 24 May 2019).
- Frost, R.L.; Rickwood, D.J. A systematic review of the mental health outcomes associated with Facebook use. Comput. Hum. Behav. 2017, 76, 576–600. [Google Scholar] [CrossRef]
- Carbia, C.; López-Caneda, E.; Corral, M.; Cadaveira, F. A systematic review of neuropsychological studies involving young binge drinkers. Neurosci. Biobehav. Rev. 2018, 90, 332–349. [Google Scholar] [CrossRef]
- Adali, S.; Golbeck, J. Predicting Personality with Social Behavior. In Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey, 26–29 August 2012; pp. 302–309. [Google Scholar]
- Agarwal, S.; Sureka, A. Role of Author Personality Traits for Identifying Intent Based Racist Posts. In Proceedings of the 2016 European Intelligence and Security Informatics Conference (EISIC), Uppsala, Sweden, 17–19 August 2016; p. 197. [Google Scholar]
- Alam, F.; Stepanov, E.A.; Riccardi, G. Personality Traits Recognition on Social Network—Facebook. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, Cambridge, MA, USA, 8–11 July 2013. [Google Scholar]
- Alsadhan, N.; Skillicorn, D. Estimating Personality from Social Media Posts. In Proceedings of the IEEE 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, USA, 18–21 November 2017; pp. 350–356. [Google Scholar]
- Annisette, L.E.; Lafreniere, K.D. Social media, texting, and personality: A test of the shallowing hypothesis. Personal. Individ. Differ. 2017, 115, 154–158. [Google Scholar] [CrossRef]
- Argamon, S.; Dhawle, S.; Koppel, M.; Pennebaker, J.W. Lexical predictors of personality type. In Proceedings of the 2005 Joint Annual Meeting of the Interface and the Classification Society of North America, St. Louis, MO, USA, 8–12 June 2005; pp. 1–16. [Google Scholar]
- Ashton, M. Personality and job performance: The importance of narrow traits. J. Organ. Behav. 1998, 19, 289–303. [Google Scholar] [CrossRef]
- Bachrach, Y.; Kosinski, M.; Graepel, T.; Kohli, P.; Stillwell, D. Personality and patterns of Facebook usage. In Proceedings of the 4th Annual ACM Web Science Conference; ACM: New York, NY, USA, 2012; pp. 24–32. [Google Scholar]
- Bai, S.; Zhu, T.; Cheng, L. Big-Five Personality Prediction Based on User Behaviors at Social Network Sites. arXiv 2012, arXiv:1204.4809. [Google Scholar]
- Bai, S.; Yuan, S.; Hao, B.; Zhu, T. Predicting personality traits of microblog users. Web Intell. Agent Syst. Int. J. 2014, 12, 249–265. [Google Scholar] [CrossRef] [Green Version]
- Ben-Ari, A.; Hammond, K. Text mining the EMR for modeling and predicting suicidal behavior among US veterans of the 1991 Persian Gulf War. In Proceedings of the IEEE 2015 48th Hawaii International Conference on System Sciences, Kauai, HI, USA, 5–8 January 2015; pp. 3168–3175. [Google Scholar]
- Bhattacharya, S.; Yang, C.; Srinivasan, P.; Boynton, B. Perceptions of presidential candidates’ personalities in twitter. J. Assoc. Inf. Sci. Technol. 2016, 67, 249–267. [Google Scholar] [CrossRef]
- Celli, F.; Poesio, M. PR2: A Language Independent Unsupervised Tool for Personality Recognition from Text. arXiv 2014, arXiv:1402.2796. [Google Scholar]
- Celli, F.; Polonio, L. Relationships between Personality and Interactions in Facebook. In Social Networking: Recent Trends, Emerging Issues and Future Outlook; Nova Science Publishers: Hauppauge, NY, USA, 2013; pp. 41–54. [Google Scholar]
- Celli, F.; Rossi, L. The Role of Emotional Stability in Twitter Conversations. In Proceedings of the Workshop on Semantic Analysis in Social Media, Avignon, France, 12 April 2012; Association for Computational Linguistics: Stroudsburg, PA, USA, 2012; pp. 10–17. [Google Scholar]
- Chapsky, D. Leveraging Online Social Networks and External Data Sources to Predict Personality. In Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, Kaohsiung, Taiwan, 25–27 July 2011; pp. 428–433. [Google Scholar]
- Chittaranjan, G.; Blom, J.; Gatica-Perez, D. Who’s Who with Big-Five: Analyzing and Classifying Personality Traits with Smartphones. In Proceedings of the 2011 15th Annual International Symposium on Wearable Computers, San Francisco, CA, USA, 12–15 June 2011; pp. 29–36. [Google Scholar]
- Chittaranjan, G.; Blom, J.; Gatica-Perez, D. Mining large-scale smartphone data for personality studies. Pers. Ubiquitous Comput. 2013, 17, 433–450. [Google Scholar] [CrossRef]
- Devaraj, S.; Easley, R.F.; Crant, J.M. How Does Personality Matter? Relating the Five-Factor Model to Technology Acceptance and Use. Inf. Syst. Res. 2008, 19, 93–105. [Google Scholar] [CrossRef]
- Farnadi, G.; Zoghbi, S.; Moens, M.-F.; De Cock, M. Recognising personality traits using Facebook status updates. In Proceedings of the workshop on computational personality recognition (WCPR13) at the 7th international AAAI conference on weblogs and social media (ICWSM13), AAAI, Boston, MA, USA, 11 June 2013. [Google Scholar]
- Gao, R.; Hao, B.; Bai, S.; Li, L.; Li, A.; Zhu, T. Improving user profile with personality traits predicted from social media content. In Proceedings of the 7th ACM Conference on Recommender Systems—RecSys ’13, Hong Kong, China, October 2013; ACM Press: Hong Kong, China, 2013; pp. 355–358. [Google Scholar]
- Golbeck, J. Predicting Personality from Social Media Text. AIS Trans. Replication Res. 2016, 2, 1–10. [Google Scholar] [CrossRef]
- Hammond, K.W.; Laundry, R.J. Application of a Hybrid Text Mining Approach to the Study of Suicidal Behavior in a Large Population. In Proceedings of the 2014 47th Hawaii International Conference on System Sciences, Waikoloa, HI, USA, 6–9 January 2014; pp. 2555–2561. [Google Scholar]
- He, Q.; Veldkamp, B.P.; Vries, T. Screening for posttraumatic stress disorder using verbal features in self narratives: A text mining approach. Psychiatry Res. 2012, 198, 441–447. [Google Scholar] [CrossRef]
- He, Q.; Glas, C.A.W.; Kosinski, M.; Stillwell, D.J.; Veldkamp, B.P. Predicting self-monitoring skills using textual posts on Facebook. Comput. Hum. Behav. 2014, 33, 69–78. [Google Scholar] [CrossRef]
- Holtgraves, T. Text messaging, personality, and the social context. J. Res. Personal. 2011, 45, 92–99. [Google Scholar] [CrossRef]
- Hu, Z.; Liu, Y.; Zhang, C.; Xu, Y. The analysis of topic’s personality traits using a new topic model. In Proceedings of the 2017 IEEE 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2–4 June 2017; pp. 1079–1083. [Google Scholar]
- Iacobelli, F.; Culotta, A. Too Neurotic, Not Too Friendly: Structured Personality Classification on Textual Data. In Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media, Cambridge, MA, USA, 8–11 July 2013. [Google Scholar]
- Jenkins-Guarnieri, M.A.; Wright, S.L.; Hudiburgh, L.M. The relationships among attachment style, personality traits, interpersonal competency, and Facebook use. J. Appl. Dev. Psychol. 2012, 33, 294–301. [Google Scholar] [CrossRef]
- Kaati, L.; Shrestha, A.; Sardella, T. Identifying warning behaviors of violent lone offenders in written communication. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–15 December 2016; pp. 1053–1060. [Google Scholar]
- Kalghatgi, M.P.; Ramannavar, M.; Sidnal, N.S. A neural network approach to personality prediction based on the big-five model. Int. J. Innov. Res. Adv. Eng. IJIRAE 2015, 2, 56–63. [Google Scholar]
- Kartelj, A.; Filipović, V.; Milutinović, V. Novel approaches to automated personality classification: Ideas and their potentials. In Proceedings of the 2012 35th International Convention MIPRO, Opatija, Croatia, 21–25 May 2012; pp. 1017–1022. [Google Scholar]
- Kermanidis, K.L. Mining authors’ personality traits from modern greek spontaneous text. In Proceedings of the Workshop on Corpora for Research on Emotion Sentiment & Social Signals, in conjunction with LREC, Istanbul, Turkey, 26 May 2012; pp. 90–93. [Google Scholar]
- Kern, M.L.; Eichstaedt, J.C.; Schwartz, H.A.; Dziurzynski, L.; Ungar, L.H.; Stillwell, D.J.; Kosinski, M.; Ramones, S.M.; Seligman, M.E.P. The Online Social Self: An Open Vocabulary Approach to Personality. Assessment 2014, 21, 158–169. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kosinski, M.; Stillwell, D.; Graepel, T. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. USA 2013, 110, 5802–5805. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kosinski, M.; Bachrach, Y.; Kohli, P.; Stillwell, D.; Graepel, T. Manifestations of user personality in website choice and behaviour on online social networks. Mach. Learn. 2014, 95, 357–380. [Google Scholar] [CrossRef] [Green Version]
- Krämer, N.C.; Winter, S. Impression management 2.0: The relationship of self-esteem, extraversion, self-efficacy, and self-presentation within social networking sites. J. Media Psychol. 2008, 20, 106–116. [Google Scholar] [CrossRef]
- Krishnamurthy, M.; Mahmood, K.; Marcinek, P. A hybrid statistical and semantic model for identification of mental health and behavioral disorders using social network analysis. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA, 18–21 August 2016; pp. 1019–1026. [Google Scholar]
- Lima, A.C.E.S.; de Castro, L.N. Multi-label Semi-supervised Classification Applied to Personality Prediction in Tweets. In Proceedings of the 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence, Ipojuca, Brazil, 8–11 September 2013; pp. 195–203. [Google Scholar]
- Lima, A.C.E.S.; de Castro, L.N. A multi-label, semi-supervised classification approach applied to personality prediction in social media. Neural Netw. 2014, 58, 122–130. [Google Scholar] [CrossRef]
- Maria Balmaceda, J.; Schiaffino, S.; Godoy, D. How do personality traits affect communication among users in online social networks? Online Inf. Rev. 2014, 38, 136–153. [Google Scholar] [CrossRef] [Green Version]
- Moore, K.; McElroy, J.C. The influence of personality on Facebook usage, wall postings, and regret. Comput. Hum. Behav. 2012, 28, 267–274. [Google Scholar] [CrossRef]
- Neuman, Y.; Cohen, Y. A Vectorial Semantics Approach to Personality Assessment. Sci. Rep. 2014, 4, 4761. [Google Scholar] [CrossRef] [Green Version]
- Neuman, Y.; Cohen, Y.; Assaf, D.; Kedma, G. Proactive screening for depression through metaphorical and automatic text analysis. Artif. Intell. Med. 2012, 56, 19–25. [Google Scholar] [CrossRef] [PubMed]
- Nie, D.; Guan, Z.; Hao, B.; Bai, S.; Zhu, T. Predicting Personality on Social Media with Semi-supervised Learning. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Warsaw, Poland, 11–14 August 2014; Volume 2, pp. 158–165. [Google Scholar]
- Nokhbeh Zaeem, R.; Manoharan, M.; Yang, Y.; Barber, K.S. Modeling and analysis of identity threat behaviors through text mining of identity theft stories. Comput. Secur. 2017, 65, 50–63. [Google Scholar] [CrossRef]
- Oberlander, J.; Nowson, S. Whose Thumb Is It Anyway? Classifying Author Personality from Weblog Text. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, Sydney, Australia, 17–18 July 2006; Association for Computational Linguistics: Sydney, Australia, 2006; pp. 627–634. [Google Scholar]
- Ou, G.; Li, J.; Guo, J.; Cai, Z.; Lu, M. The Bloggers’ Personality Traits Categorizing Algorithm Based on Text Features Analysis; Atlantis Press: Paris, France, 2016; pp. 1–6. [Google Scholar]
- Pabón, O.H.P.; González, F.A.; Aponte, J.; Camargo, J.E.; Restrepo-Calle, F. Finding Relationships between Socio-Technical Aspects and Personality Traits by Mining Developer E-mails. In Proceedings of the 2016 IEEE/ACM Cooperative and Human Aspects of Software Engineering (CHASE), Austin, TX, USA, 16 May 2016; pp. 8–14. [Google Scholar]
- Panicheva, P.; Ledovaya, Y.; Bogolyubova, O. Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts. In Proceedings of the 2016 IEEE Artificial Intelligence and Natural Language Conference (AINL), St. Petersburg, Russia, 10–12 November 2016; pp. 1–8. [Google Scholar]
- Park, G.; Schwartz, H.A.; Eichstaedt, J.C.; Kern, M.L.; Kosinski, M.; Stillwell, D.J.; Ungar, L.H.; Seligman, M.E.P. Automatic personality assessment through social media language. J. Pers. Soc. Psychol. 2015, 108, 934–952. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Peng, K.; Liou, L.; Chang, C.; Lee, D. Predicting personality traits of Chinese users based on Facebook wall posts. In Proceedings of the 2015 24th Wireless and Optical Communication Conference (WOCC), Taipei, Taiwan, 23–24 October 2015; pp. 9–14. [Google Scholar]
- Pramodh, K.C.; Vijayalata, Y. Automatic personality recognition of authors using big five factor model. In Proceedings of the 2016 IEEE International Conference on Advances in Computer Applications (ICACA), Coimbatore, India, 24 October 2016; IEEE: Coimbatore, India, 2016; pp. 32–37. [Google Scholar]
- Pratama, B.Y.; Sarno, R. Personality classification based on Twitter text using Naive Bayes, KNN and SVM. In Proceedings of the 2015 International Conference on Data and Software Engineering (ICoDSE), Yogyakarta, Indonesia, 25–26 November 2015; pp. 170–174. [Google Scholar]
- Preoţiuc-Pietro, D.; Eichstaedt, J.; Park, G.; Sap, M.; Smith, L.; Tobolsky, V.; Schwartz, H.A.; Ungar, L. The role of personality, age, and gender in tweeting about mental illness. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Denver, CO, USA, 5 June 2015; pp. 21–30. [Google Scholar]
- Qiu, L.; Lin, H.; Ramsay, J.; Yang, F. You are what you tweet: Personality expression and perception on Twitter. J. Res. Personal. 2012, 46, 710–718. [Google Scholar] [CrossRef]
- Quercia, D.; Kosinski, M.; Stillwell, D.; Crowcroft, J. Our Twitter Profiles, Our Selves: Predicting Personality with Twitter. In Proceedings of the 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, Boston, MA, USA, 9–11 October 2011; pp. 180–185. [Google Scholar]
- Quercia, D.; Lambiotte, R.; Stillwell, D.; Kosinski, M.; Crowcroft, J. The Personality of Popular Facebook Users. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, Seattle, WA, USA, 11–15 February 2012; ACM: New York, NY, USA, 2012; pp. 955–964. [Google Scholar]
- Reips, U.-D.; Garaizar, P. Mining twitter: A source for psychological wisdom of the crowds. Behav. Res. Methods 2011, 43, 635. [Google Scholar] [CrossRef]
- dos Santos, W.R.; Paraboni, I. Personality facets recognition from text. arXiv 2018, arXiv:1810.02980. [Google Scholar]
- Schwartz, H.A.; Eichstaedt, J.C.; Kern, M.L.; Dziurzynski, L.; Ramones, S.M.; Agrawal, M.; Shah, A.; Kosinski, M.; Stillwell, D.; Seligman, M.E.P.; et al. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE 2013, 8, e73791. [Google Scholar] [CrossRef]
- Seibert, S.E.; Kraimer, M.L. The Five-Factor Model of Personality and Career Success. J. Vocat. Behav. 2001, 58, 1–21. [Google Scholar] [CrossRef]
- Seidman, G. Self-presentation and belonging on Facebook: How personality influences social media use and motivations. Personal. Individ. Differ. 2013, 54, 402–407. [Google Scholar] [CrossRef]
- Skues, J.L.; Williams, B.; Wise, L. The effects of personality traits, self-esteem, loneliness, and narcissism on Facebook use among university students. Comput. Hum. Behav. 2012, 28, 2414–2419. [Google Scholar] [CrossRef]
- Souri, A.; Hosseinpour, S.; Rahmani, A.M. Personality classification based on profiles of social networks’ users and the five-factor model of personality. Hum. Centric Comput. Inf. Sci. 2018, 8, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Srividya, K.; Sowjanya, A.M. Behavioral analysis of internet messaging and malicious activity detection. In Proceedings of the 2016 International Conference on Advances in Human Machine Interaction (HMI), Doddaballapur, India, 3–5 March 2016; pp. 1–5. [Google Scholar]
- Tazghini, S.; Siedlecki, K.L. A mixed method approach to examining Facebook use and its relationship to self-esteem. Comput. Hum. Behav. 2013, 29, 827–832. [Google Scholar] [CrossRef]
- Uddin, M.F. Noise Removal and Structured Data Detection to Improve Search for Personality Features. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, San Francisco, CA, USA, 18–21 August 2016; IEEE Press: Piscataway, NJ, USA, 2016; pp. 1349–1355. [Google Scholar]
- Wang, S.S. “I Share, Therefore I Am”: Personality Traits, Life Satisfaction, and Facebook Check-Ins. Cyberpsychology Behav. Soc. Netw. 2013, 16, 870–877. [Google Scholar] [CrossRef] [PubMed]
- Wei, H.; Zhang, F.; Yuan, N.J.; Cao, C.; Fu, H.; Xie, X.; Rui, Y.; Ma, W.-Y. Beyond the Words: Predicting User Personality from Heterogeneous Information. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, UK, 6–10 February 2017; ACM: New York, NY, USA, 2017; pp. 305–314. [Google Scholar]
- Winter, S.; Neubaum, G.; Eimler, S.C.; Gordon, V.; Theil, J.; Herrmann, J.; Meinert, J.; Krämer, N.C. Another brick in the Facebook wall—How personality traits relate to the content of status updates. Comput. Hum. Behav. 2014, 34, 194–202. [Google Scholar] [CrossRef]
- Yarkoni, T. Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. J. Res. Personal. 2010, 44, 363–373. [Google Scholar] [CrossRef] [Green Version]
- Yoon, S.; Elhadad, N.; Bakken, S. A Practical Approach for Content Mining of Tweets. Am. J. Prev. Med. 2013, 45, 122–129. [Google Scholar] [CrossRef] [Green Version]
- Zhou, X.; Han, H.; Chankai, I.; Prestrud, A.; Brooks, A. Approaches to Text Mining for Clinical Medical Records. In Proceedings of the 2006 ACM Symposium on Applied Computing, Dijon, France, 23–27 April 2006; ACM: New York, NY, USA, 2006; pp. 235–239. [Google Scholar]
- Kumar, K.P.; Gavrilova, M.L. Personality Traits Classification on Twitter. In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–8. [Google Scholar]
- Yang, H.-C.; Huang, Z.-R. Mining personality traits from social messages for game recommender systems. Knowl. Based Syst. 2019, 165, 157–168. [Google Scholar] [CrossRef]
- Zheng, H.; Wu, C. Predicting Personality Using Facebook Status Based on Semi-supervised Learning. In Proceedings of the Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019; Association for Computing Machinery: Zhuhai, China, 2019; pp. 59–64. [Google Scholar]
- Irfan, R.; King, C.K.; Grages, D.; Ewen, S.; Khan, S.U.; Madani, S.A.; Kolodziej, J.; Wang, L.; Chen, D.; Rayes, A.; et al. A survey on text mining in social networks. Knowl. Eng. Rev. 2015, 30, 157–170. [Google Scholar] [CrossRef] [Green Version]
- Ramos, J. Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Princeton, NJ, USA, 3 December 2003; Volume 242, pp. 133–142. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Majumder, N.; Poria, S.; Gelbukh, A.; Cambria, E. Deep learning-based document modeling for personality detection from text. IEEE Intell. Syst. 2017, 32, 74–79. [Google Scholar] [CrossRef]
- Sun, X.; Liu, B.; Cao, J.; Luo, J.; Shen, X. Who am I? Personality detection based on deep learning for texts. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. [Google Scholar]
- Su, M.-H.; Wu, C.-H.; Zheng, Y.-T. Exploiting turn-taking temporal evolution for personality trait perception in dyadic conversations. IEEEACM Trans. Audio Speech Lang. Process. 2016, 24, 733–744. [Google Scholar] [CrossRef]
- Mehta, Y.; Majumder, N.; Gelbukh, A.; Cambria, E. Recent trends in deep learning based personality detection. Artif. Intell. Rev. 2020, 53, 2313–2339. [Google Scholar] [CrossRef] [Green Version]
- Tsiropoulou, E.; Koukas, K.; Papavassiliou, S. A socio-physical and mobility-aware coalition formation mechanism in public safety networks. EAI Endorsed Trans. Future Internet 2018, 4, 154176. [Google Scholar] [CrossRef]
- Vamvakas, P.; Tsiropoulou, E.E.; Papavassiliou, S. On controlling spectrum fragility via resource pricing in 5g wireless networks. IEEE Netw. Lett. 2019, 1, 111–115. [Google Scholar] [CrossRef]
- Molani, S.; Madadi, M.; Wilkes, W. A partially observable Markov chain framework to estimate overdiagnosis risk in breast cancer screening: Incorporating uncertainty in patients adherence behaviors. Omega 2019, 89, 40–53. [Google Scholar] [CrossRef]
- Wu, L.; Morstatter, F.; Hu, X.; Liu, H. Mining misinformation in social media. Big Data Complex. Soc. Netw. 2016, 123–152. [Google Scholar]
Row | Keywords |
---|---|
Test set 1 | “human behavior” OR “personality traits” |
Test set 2 | “data-based approach” OR “multivariate analysis” OR “big data methods” OR “artificial intelligence” OR “machine learning” |
Test set 3 | “textual data” OR “textual feature” OR “textual indicator” |
Search 1 | #1 AND #2 AND #3 |
Reference | Methods for Identifying Indicators of Human Behavior | Methods for Predicting Human Behavior | |
---|---|---|---|
Adali & Golbeck | [31] | Correlation analysis | Gaussian Process and ZeroR regression algorithms |
Agarwal & Sureka | [32] | Naïve Bayes, random forest, and decision tree classifiers | |
Alam et al. | [33] | Support vector machine, Bayesian logistic regression, binary logistic regression, and multinomial naïve Bayes sparse modeling | |
Alsadhan & Skillicorn | [34] | New approach based on the frequency and similarity of the words among each of the Big Five personality traits | |
Amichai-Hamburger & Vinitzky | [23] | Regression analysis, two-way ANOVA | |
Annisette & Lafreniere | [35] | Correlation analysis and hierarchical multiple regression | |
Argamon et al. | [36] | Sequential minimal optimization, support vector machine algorithms | |
Ashton | [37] | Correlation analysis | |
Bachrach et al. | [38] | Correlation analysis | Multivariate linear regression, support vector machine algorithms, and decision stump algorithms |
Bai et al. | [39] | Support vector machine, naive Bayes, decision tree algorithms | |
Bai et al. | [40] | Correlation analysis | Multi-task regression and incremental regression |
Ben-Ari & Hammond | [41] | Random forest algorithm | |
Bhattacharya et al. | [42] | Correlation analysis | |
Celli & Poesio | [43] | Correlation analysis | Unsupervised personality recognition system |
Celli & Polonio | [44] | Correlation analysis | Unsupervised personality recognition system |
Celli & Rossi | [45] | Correlation analysis | Unsupervised personality recognition system |
Chapsky | [46] | Bayesian network | |
G. Chittaranjan et al. | [47] | Correlation and multiple regression analysis | Decision trees and support vector machine classifiers |
Gokul Chittaranjan et al. | [48] | Correlation and multiple regression analysis | Support vector machine classifier |
Devaraj et al. | [49] | Correlation and multiple regression analysis | |
Farnadi et al. | [50] | Support vector machine, nearest neighbor with k = 1, and naive Bayes algorithms | |
Farnadi et al. | [8] | Correlation analysis | Decision tree algorithm and support vector machine algorithm |
Fatima et al. | [9] | Decision trees, random forest-based method, and support vector machine classifier | |
Gao et al. | [51] | Gaussian process, M5′ rules, and Pace Regression | |
Golbeck et al. | [21] | Correlation analysis | M5′ rules and Gaussian process algorithms |
Golbeck | [52] | Receptiviti API | |
Gupta & Chatterjee | [16] | Rough sets and LEM algorithm | |
Hammond & Laundry | [53] | Software based on support vector conditional random field classifier | |
He et al. | [54] | Text classification algorithm named product score model | |
He et al. | [55] | Logistic regression and classification tree | |
Holtgraves | [56] | Correlation analysis | |
Hu et al. | [57] | New model based on detecting the relationship between personality and topic | |
Iacobelli & Culotta | [58] | Conditional random fields model, sequential minimal optimization, and naïve Bayes algorithms | |
Jenkins-Guarnieri et al. | [59] | Correlation analysis | |
Kaati et al. | [60] | Machine learning algorithms including Adaboost and classification trees | |
Kalghatgi et al. | [61] | Neural network algorithm | |
Kartelj et al. | [62] | Automated Personality Classification model based on linear regression, M5′ classification tree, M5′ regression tree, and support vector machine | |
Kermanidis | [63] | Support vector machine classifier | |
Kern et al. | [64] | Correlation analysis | |
Kosinski et al. | [65] | Dimensionality reduction and linear regression analysis | |
Kosinski et al. | [66] | Logistic regression classifier | |
Krämer & Winter | [67] | Correlation analysis | |
Krishnamurthy et al. | [68] | Correlation analysis | |
Lima & de Castro | [69] | Bayesian personality predictor model based on naïve Bayes algorithm | |
Lima & de Castro | [70] | Personality prediction in social media data (PERSOMA) approach | |
Maria Balmaceda et al. | [71] | Correlation analysis, Apriori algorithm, and Knime tool | |
Markovikj et al. | [14] | Pearson correlation analysis | Support vector machines, simple minimal optimization, and Boost algorithms |
Moore & McElroy | [72] | Hierarchical regression and correlation analysis | |
Neuman & Cohen | [73] | New vectorial semantics approach | |
Neuman et al. | [74] | Correlation analysis | Pedesis tool |
Nie et al. | [75] | Local linear semi-supervised regression algorithm | |
Nokhbeh Zaeem et al. | [76] | Approach based on statistical analysis and the Identity Threat Assessment and Prediction (ITAP) algorithm | |
Oberlander & Nowson | [77] | Support vector machines, naïve Bayes algorithms | |
Ortigosa et al. | [19] | TP2010 application | Naïve Bayes, classification tree algorithms |
Ou et al. | [78] | ANOVA | Support vector machine |
Pabón et al. | [79] | IBM Watson personality insights | |
Panicheva et al. | [80] | Correlation analysis | |
Park et al. | [81] | Correlation analysis | Regression model |
Peng et al. | [82] | Support vector machine | |
Pramodh & Vijayalata | [83] | Data-based approach based on word counts | |
Pratama & Sarno | [84] | Naïve Bayes, K-nearest neighbors, and support vector machine algorithms | |
Preoţiuc-Pietro et al. | [85] | Logistic regression classifiers | |
Qiu et al. | [86] | Correlation analysis | |
Quercia et al. | [87] | Correlation analysis | M5’ rules algorithm |
Quercia et al. | [88] | Correlation analysis | Simple regression model |
Reips & Garaizar | [89] | Text mining with Iscience maps web service | |
Santos & Paraboni | [90] | Naïve Bayes classification and logistic regression | |
Schwartz et al. | [91] | Correlation analysis and standardized linear regression | Support vector machine and regression analysis |
Seibert & Kraimer | [92] | Correlation analysis | |
Seidman | [93] | Correlation analysis | |
Skues et al. | [94] | Correlation analysis | |
Souri et al. | [95] | Naïve Bayes, neural network, decision tree, and support vector machine classifiers | |
Srividya & Sowjanya | [96] | Simple correlation analysis and comparison of the frequency with a threshold | |
Sumner et al. | [25] | Correlation analysis | Support vector machine using sequential minimal optimization and a polynomial kernel, random forest, and naïve Bayes classifier |
Tazghini & Siedlecki | [97] | Hierarchical regression and correlation analysis | |
Uddin | [98] | Series of algorithms and models called PERFECT Algorithm Engine (PAE) | |
Wald et al. | [22] | Numeric prediction models including linear regression, Reptree, and decision tables | |
Wang | [99] | Correlation analysis | |
Wei et al. | [100] | Proposed Heterogeneous Information Ensemble framework | |
Winter et al. | [101] | Correlation analysis and hierarchical regression analyses | |
Liu et al. | [24] | Naïve Bayes classification | |
Yarkoni | [102] | Correlation analysis | |
Yoon et al. | [103] | Term frequency analysis | |
Zhou et al. | [104] | Medical information extraction (medie) system using decision tree-based text classification | |
Kumar & Gavrilova | [105] | XGBoost and support vector machine algorithms | |
Yang & Huang | [106] | Personality Recognizer tool based on linear regression, M5′ model tree, M5′ regression tree, and support vector machine algorithms | |
Zheng & Wu | [107] | Semi-supervised learning methods called pseudo multi-view co-training |
Reference | Features | Classification Method (Best Performance Method) | Accuracy (Best Case) |
---|---|---|---|
[31] | Pairwise network bandwidth features | Gaussian Process and zeroR regression algorithms | |
[32] | Linguistic features | Random forest algorithm | 78.0 |
[33] | TF-IDF features | Multinomial naïve Bayes sparse modeling | 61.8 |
[36] | Linguistic features | Sequential minimal optimization algorithm | 58.0 |
[38] | Facebook’s pre-defined features | Multivariate regression | |
[39] | Facebook’s pre-defined features | Support vector machine, naïve Bayes, and decision tree algorithms | 81.1 |
[46] | Facebook’s pre-defined features | Bayesian network | |
[50] | Facebook’s pre-defined features, LIWC features | Support vector machine with a linear kernel, naive Bayes algorithms | 66.0 |
[9] | LIWC features | Random forest classification | |
[47] | Sequential backward feature selection algorithm | Support vector machine classifier | 83.0 |
[51] | Word-based features, character-based features, and LIWC features | M5′ rules, Pace regression and Gaussian process | |
[21] | Facebook’s pre-defined features | M5′ rules and Gaussian process algorithms | 65.0 |
[16] | LIWC and NLTK features | Rough sets and LEM algorithm | 84.67 |
[53] | Linguistic features | Software based on support vector conditional random fields classifier | |
[54] | Using chi-square selection algorithm to select top keywords | Text classification algorithm named product score model | 80.0 |
[55] | Textual features of posts on Facebook | Logistic regression and classification tree | 62.9 |
[60] | Psychologically meaningful features, according to LIWC | Adaboost algorithm | 92.2 |
[65] | Facebook’s pre-defined features | Logistic/linear regression | 78.0 |
[66] | Facebook’s pre-defined features | Logistic/linear regression | 75.0 |
[63] | Modern Greek textual features | Support vector machine classifier | 86.0 |
[14] | Facebook’s pre-defined features | Multiboostab and adaboostM1 algorithms | |
[73] | Textual features | Vectorial semantics approach (tree-based classification model) | 64.0 |
[19] | Facebook’s pre-defined features | Naïve Bayes and classification tree algorithms | 82.8 |
[81] | Linguistic features | Regression model | |
[82] | Linguistic features | Support vector machine algorithm | 73.5 |
[85] | Demographic features and linguistic features | Binary logistic regression classifiers with elastic net regularization | 81.9 |
[87] | Facebook’s pre-defined features | M5′ rules algorithm | |
[95] | Facebook’s pre-defined features | Boosting-decision tree classifier | 82.0 |
[25] | LIWC features | Support vector machine, random forest, naïve Bayes classifiers | 64.7 |
[22] | Facebook’s pre-defined features | Linear regression, REPTree, and decision table algorithms | 75.0 |
[105] | TF-IDF features and GloVe word embedding | XGBoost and support vector machine algorithms | 85.0 |
[106] | Linguistic features | Personality Recognizer tool based on linear regression, M5′ model tree, M5′ regression tree, and support vector machine algorithms |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Davahli, M.R.; Karwowski, W.; Gutierrez, E.; Fiok, K.; Wróbel, G.; Taiar, R.; Ahram, T. Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data. Symmetry 2020, 12, 1902. https://doi.org/10.3390/sym12111902
Davahli MR, Karwowski W, Gutierrez E, Fiok K, Wróbel G, Taiar R, Ahram T. Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data. Symmetry. 2020; 12(11):1902. https://doi.org/10.3390/sym12111902
Chicago/Turabian StyleDavahli, Mohammad Reza, Waldemar Karwowski, Edgar Gutierrez, Krzysztof Fiok, Grzegorz Wróbel, Redha Taiar, and Tareq Ahram. 2020. "Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data" Symmetry 12, no. 11: 1902. https://doi.org/10.3390/sym12111902
APA StyleDavahli, M. R., Karwowski, W., Gutierrez, E., Fiok, K., Wróbel, G., Taiar, R., & Ahram, T. (2020). Identification and Prediction of Human Behavior through Mining of Unstructured Textual Data. Symmetry, 12(11), 1902. https://doi.org/10.3390/sym12111902