TEXT CLASSIFICATION USING TF-IDF AND MACHINE LEARNING ALGORITMS

Khan, Faraz Ahmed Reg # 48474; Hussain, Zaeem Shakir Reg # 48438; Rehman, Abdul Reg # 48414

DSpace Home
→
Thesis/Dissertation Repository Karachi Campus
→
Department of Computer Science (BUKC)
→
BS-CS (BUKC)
→
View Item

TEXT CLASSIFICATION USING TF-IDF AND MACHINE LEARNING ALGORITMS

Khan, Faraz Ahmed Reg # 48474; Hussain, Zaeem Shakir Reg # 48438; Rehman, Abdul Reg # 48414

URI: http://hdl.handle.net/123456789/16668

Date: 2020

Abstract:

The reason for the creation ofthis system is the need for the classification ofthe newspaper articles. For the desired purpose we have collected many newspaper articles from many different newspapers. The article will be compared with the articles stored in dataset and ifthe descriptors and key points for the articles matches with the articles in our dataset, the details of that specific articles will be sent to the system. Some outputs are going to be available whichTl show that there is good accuracy ofrecognition of articles. ;■ Starting from Support Vector Machine (SVM) and its variants gaining momentum among the Machine Learning community. In this paper, we present a quantitative analysis between the established SVM based classifiers on multi-category text classification problem. Here,. The dataset is first converted into activities which are required format by performing preprocessing involve tokenization and removing irrelevant data. The feature set is Term Frequency-Inverse Document Frequency constructed as matrix, so that representative vectors could be obtained for each document. Experimentally, and different models SVM fits best in accuracy, after making models we ranked those given articles