Abstract:
In our daily life, people’s opinions and experiences are important sources of information. To
measure the feeling of people’s opinions the term used that is called sentiment analysis. Sentiment
analysis used to analyze the information from social media and other related resources. The text
is the main method of communicating on the Internet in modern digital time. Sentiment analysis
captures the user’s views, moods, and their opinion related to the specific services provided by the
business organization in a real time. This research focuses on Roman Urdu reviews (Widely used
reviews) that are obtained by one of the social media posts. It has three basic classes: negative,
positive, and neutral where reviews classified. The proposed method is compared to the result of
machine learning approaches and the experimental results are hopeful. Our study compares the
result of different classifiers that extract product characteristics and consistent opinions via, SVM,
Random, forest, decision tree, Naïve Bayes. The main focus of this research is the opinion mining
of customer reviews, customer reviews discuss in the roman Urdu language and apply to define
different methods against the opinions. Final aim to compare the implementation results of
different classifiers of machine learning and prove the best approach for sentiment analysis on the
roman Urdu dataset and to develop the algorithm for pre-processing to remove the noise from the
data. To perform pre-processing filter the noise from textual reviews, it is necessary to perform
pre-processing steps followed in sequence. In this process, input reviews are processed and get
output process reviews. By filtering the reviews all special characters, white space, single
characters remove from the data set, and also the size of data will be small. Further, we have
analyzed data using machine learning techniques applied to process data. Sport vector machine
(SVM), Naïve Bayes, Decision Tree, and Random forest used to measure the accuracy in theiv
dataset as well as to measure the precision, recall, and F-measure score of data on the different
classifiers.