Abstract:
In this age ofinformation news is an important aspect of our daily lives. The need to
stay up to date with everyday events is becoming greater day by day. However
different types ofpeople are interested in different types ofnews. As such a system is
required that can classify news according to category to make it easier for users to
find news that is relevant to them. There are existing systems for English language
however that is not the case in Urdu and there is very limited work in regards to Urdu
text classification as classifying text in Urdu can be a very challenging task. In this
project, we are using pre compiled Urdu news datasets. Our datasets contains news
related to six categories with Health, Science, Politics, Entertainment, Business and
Sports. In order for the machine learning algorithms to work on the data we needed
to apply pre-processing techniques like stop words removal and a feature extraction
first. Feature extraction was performed by using TF-IDF and count vectorization
techniques. We will use LSTM model for News classification targeting 80%
accuracy