Abstract:
The potential applications of personality prediction from textual data in psychology, marketing, and human-computer interaction have sparked considerable attention in recent years. While previous research has provided useful insights, this work takes a novel approach to personality prediction by combining the power of powerful transformer-based models such as BigBird, Albert, and DistilBERT with NLP statistical characteristics. Notably, these cutting-edge models have never been used in this context before. The goal of this study is to thoroughly examine and compare the performance of these advanced models, enhanced with NLP statistical features, vs. conventional methods in predicting personality traits across varied textual datasets such as the Facebook dataset and the essay dataset. By doing so, the study hopes to shed light on the untapped potential and challenges inherent in using transformer-based models and NLP statistics for personality trait prediction, advancing our understanding of their capabilities and the advantages they offer over established techniques. In this study, we used two classifiers, BiGRU and BiLSTM, to classify five personality traits of Big 5 personality trait model using Facebook and essay datasets. When combined with NLP statistical features and BiLSTM, Big- Bird achieves F1-scores of 0.82, 0.76, 0.74, 0.84, and 0.81 for the traits EXT, NEU, AGR, CON, and OPN, respectively, with accuracies of 85.16%, 87.39%, 92.35%, 98.48%, and 98.33% on the Facebook dataset. These findings illustrate the power of advanced transformer-based models augmented with NLP statistics in predicting personality across diverse datasets. Our evaluation also includes accuracy and F1- score results for each attribute and dataset, allowing us to provide a full assessment of our models’ performance. This study adds to the growing field of personality prediction by bringing advance approaches and emphasizing the efficiency of sophisticated transformer-based models in comprehending human behavior through textual data.