Abstract:
The objective of this project is to identify the sarcastic twitter tweets with the
comparison oftext and emoticons. The report investigates various techniques used for
the identification of sarcastic tweets. Different stages involving twitter tweets
processing which includes pre-processing stage, segmentation of text and emoticon
and further more feature extraction will also be deliberate and discussed. Lastly the
end result ofthe algorithms will identify the sarcastic tweets.
The system first proceeds with the pre-processing ofthe dataset with 9 thousand plus
oftweets. We have selected the tweets which are having only happy and sad emoticons.
After that we have separated the text and emoji. Using the Text-Blob python library,
have to calculate the polarity ofthe text (i.e. positive or negative or neutral).
We then cleaned the text (i.e. removing stop words, punctuation etc.) and by using
count vectorizing library we created a bag of words model from which we will obtain
matrix, after that splitting the dataset into training and testing, and applying
we
a sparse
Gaussian Naive Bayes, Multinomial Naive Bayes algorithm to create a model for
predicting polarity and emoticon ofthe unseen text.