Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
| dc.contributor.author | Zeeshan Saleem, 01-241161-017 | |
| dc.date.accessioned | 2018-08-29T08:08:09Z | |
| dc.date.available | 2018-08-29T08:08:09Z | |
| dc.date.issued | 2018 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/7381 | |
| dc.description | Supervised by Dr. Kashif Naseer Qureshi | en_US |
| dc.description.abstract | Text classification is one of the most important tasks in text mining and machine learning. With the increase in data volume on World Wide Web the significance of such task increases. It requires huge human efforts to understand and classify the digital data available on internet. Text classification is a task to classify the number of text files in to different classes. The data or text available on the internet is in unstructured form which increases the difficulty to understand and classify it for useful purposes. The research study investigates how text preprocessing techniques affect the quality of text classification results. Text preprocessing techniques like tokenization, stemming and stop words removal are studied in detail. Furthermore unigram, bigram and trigram attributes are also been tested. Attribute selection methods are also examined and their impact on the text classification results. Two popular news dataset are used where news are short text file. The two news dataset are 20NewsGroups and BBC News. In order to carry out detail investigation 11 versions are created of each dataset and on each dataset different preprocessing technique are applied in order to understand the impact of each technique on classification results. This study tackles the text classification problem. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Software Engineering, Bahria University Engineering School Islamabad | en_US |
| dc.relation.ispartofseries | MS SE;T-0745 | |
| dc.subject | Software Engineering | en_US |
| dc.title | A Detailed Investigation of Impact of Text Preprocessing and Attribute Selection Methods on The Performance of Classification Algorithms (T-0745) (MFN 6890) | en_US |
| dc.type | MS Thesis | en_US |