DSpace Repository

Framework for Classification of Urdu News based on their headlines (T-0683) (MFN 4230)

Show simple item record

dc.contributor.author Kashif Ahmed, 01-244121-004
dc.date.accessioned 2017-07-20T07:36:54Z
dc.date.available 2017-07-20T07:36:54Z
dc.date.issued 2014
dc.identifier.uri http://hdl.handle.net/123456789/2866
dc.description Supervised by Dr. Shehzad Khalid en_US
dc.description.abstract Automatic text classification due to its various applications in Data Mining and information technology has gain immense importance. It plays a vital role in various fields i.e. Spam filtering, News classification, Noise reduction, and much more. Currently, there exists lots of work for classifying text especially at document level in different available languages i.e. English News classification, Persian text classification etc. but work related to short Urdu text or Urdu news headlines classification is not carried out so for. In order to classify Urdu text data, many preprocessing steps i.e. stop words removal, tokenization, stemming etc. are of prime consideration. After performing the required pre-processing, desired features are selected, which are then classified using existing text classification methodologies i.e. SVM, Naive Bayes much more. In our proposed work, we have developed a system, which classifies Urdu news headlines to one of the pre-defined classes. A systematic and module based approach is proposed. In the very first module, we perform basic pre-processing steps using the train data. This comprises of exploding headlines into segments utilizing tokenization, cleaning data from diacritics and meaningless words by text sanitization process, removing stop words by using the existing stop words lists for Urdu language and words stemming by utilizing an existing generic stemming technique for Urdu language. In the second module, SVM based model learned using feature vector generated combing all words from each class after deploying threshold value. In the last and third module, pre-process unseen news headlines, and classifies Urdu headlines of test data to the pre-defined classes by utilizing the feature vector maximum index. The word with maximum index value of the feature vector is classifies to that word particular class. Experimental evaluation and results of our proposed system are presented in tabular form. To prove the effectiveness of our proposed system, competitor analysis have been by deploying the competitor system on our self-generated datasets. en_US
dc.language.iso en en_US
dc.publisher Software Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries MS SE;T-0683
dc.subject Software Engineering en_US
dc.title Framework for Classification of Urdu News based on their headlines (T-0683) (MFN 4230) en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account