Abstract:
The recent years have seen the implementation and analysis of classification models in political discussions on social media platforms, particularly focusing on traditional machine learning and state-of-the-art transformer-based deep learning architectures. For this research, we collect data from Twitter API, with a focus on tweets based upon three major political parties in Pakistan including PMLN (Pakistan Muslim League Noon), PTI (Pakistan Tehreek e Insaf), and PPP (Pakistan People’s Party). The dataset consisted of 17, 452 tweets which were further pre-processed and refined to the final annotated dataset of 3, 617 involving only relevant entries. Three neutral annotators annotate this data with proper guidelines. The study employs Rigorous preprocessing techniques including tokenization, text cleaning, and splitting of data for training, validation, and testing purposes. Deep learning models involving CNNc, RNNs, GRU, LSTM, and transformer models including RoBERTa, and BERT.We also tested our data on Conventional models such as the support vector machine (SVM), which were implemented to assess their performance. For the measurement of the efficacy of the model, various evaluation metrics are used including Precision, Recall, Accuracy, and F1-score. Finally, the findings of the current research revealed that the performance of the transformerbased model was outstanding while outperforming deep learning and traditional model approaches. Its performance provided insights into their accuracy and efficiency in text classification tasks within the political domain of Pakistan.