Abstract:
Now a days we are surrounded by fake news. Fake news or disinformative news spread out with a very high ratio among various communities over the globe due to the advancement in technology in term of involving social media platforms such as Facebook, Twitter, and instagram in our daily life routine. Fake news is described as erroneous, deceptive assertions, disinformation, or propaganda disguised as news. Even before the internet and other technical systems, the idea of fake news was stimulating. For long-winded, spreading fake news and misleading data has always been a vital role. They all have an agenda for spreading fake news. That might be at a lower or higher level of employees. These days, anyone can share things on the internet, such as blogs, news reports, and social media. Now that data is so readily available, they can distribute false information quickly. We have access to all kinds of data, news sites, but the main difficulty is that you can't identify or specify even the actual news as technology advances. Therefore, researchers have been attracted to the domain of detecting fake news or disinformative news among hundreds of thousand news collection as day passes, we read different news about different things but we don't know that either the news we are reading is true or not. The researchers developed different techniques to identify the fake news; based on their studies they proposed different methods like supervised and unsupervised learning. Although every study has its own limitations. In this research researcher we had two different datasets i.e LIAR and I SOT. The major challenge is to work on multi class dataset. As we are using LIAR dataset which is multi class dataset. The second challenge is this dataset set is it has both content and context features whereas ISOT has binary classes with only content features. The major challenge is to work with unbalanced classes as number of classes are unbalanced in LIAR Dataset. Initially, We execute pre preprocessing procedure having six steps. After prepossessing we extract context and content features and apply N gram by utilising One hot encoder. Then we apply TF IDF, afterwards we use stratified shuffle split, which Provides train/test indices for the purpose of dividing data into train and test sets. We use four different state of the art classification models i.e Linear Regression, Linear SVC, Passive Aggressive, SGDC. On LIAR, We achieve 37% accuracy and 84% on ISOT dataset.