A Detailed Investigation of Impact of Text Preprocessing and Attribute Selection Methods on The Performance of Classification Algorithms (T-0745) (MFN 6890)

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Zeeshan Saleem, 01-241161-017
dc.date.accessioned 2018-08-29T08:08:09Z
dc.date.available 2018-08-29T08:08:09Z
dc.date.issued 2018
dc.identifier.uri http://hdl.handle.net/123456789/7381
dc.description Supervised by Dr. Kashif Naseer Qureshi en_US
dc.description.abstract Text classification is one of the most important tasks in text mining and machine learning. With the increase in data volume on World Wide Web the significance of such task increases. It requires huge human efforts to understand and classify the digital data available on internet. Text classification is a task to classify the number of text files in to different classes. The data or text available on the internet is in unstructured form which increases the difficulty to understand and classify it for useful purposes. The research study investigates how text preprocessing techniques affect the quality of text classification results. Text preprocessing techniques like tokenization, stemming and stop words removal are studied in detail. Furthermore unigram, bigram and trigram attributes are also been tested. Attribute selection methods are also examined and their impact on the text classification results. Two popular news dataset are used where news are short text file. The two news dataset are 20NewsGroups and BBC News. In order to carry out detail investigation 11 versions are created of each dataset and on each dataset different preprocessing technique are applied in order to understand the impact of each technique on classification results. This study tackles the text classification problem. en_US
dc.language.iso en en_US
dc.publisher Software Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries MS SE;T-0745
dc.subject Software Engineering en_US
dc.title A Detailed Investigation of Impact of Text Preprocessing and Attribute Selection Methods on The Performance of Classification Algorithms (T-0745) (MFN 6890) en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account