TEXT SUMMARIZER USING MACHINE LEARNING

Younus, Maria Reg # 48535; Osama, Muhammad Reg # 48415; Khan, Ammarah Naseem Reg # 48519

DSpace Home
→
Thesis/Dissertation Repository Karachi Campus
→
Department of Computer Science (BUKC)
→
BS-CS (BUKC)
→
View Item

TEXT SUMMARIZER USING MACHINE LEARNING

Younus, Maria Reg # 48535; Osama, Muhammad Reg # 48415; Khan, Ammarah Naseem Reg # 48519

URI: http://hdl.handle.net/123456789/16641

Date: 2020

Abstract:

Text summarization refers to the technique of shortening long pieces of text. With such a big amount of data circulating in the digital space, there is a need to develop machine learning technique that can automatically shorten longer texts and deliver accurate summaries that can fluently pass the intended messages. The intention is to create a coherent and fluent summary having only the main points outlined in the document. Machine learning and natural language processing (NLP) will be used to automate text summarization. The objective of this project is to develop Text Summarizer using Machine Learning. It can overcome the grammar inconsistencies ofthe extractive method. Text summarizer will convert the paragraph into sentences. First, Text summarizer split the paragraph into its corresponding sentences, then Text will be processed, the next step will be Tokenization. It will evaluate the weighted occurrence frequency of the words, and then it will substitute words with their weighted frequencies. All of the work will be done using R tool. The main advantage of using this technique is that it provides the source text into a shorter version with semantics, it reduces the reading time, it expresses the main intent of the given document. Text summarization takes care of choosing the most significant portions of text and generates coherent summaries that express the main intent ofthe given document. Extraction based text summarization involves selecting sentences of high relevance (rank) from the document. The abstractive text summarization algorithms create new phrases and sentences that relay the most useful information from the original text just like humans do. The system first proceeds with the pre-processing of the given text, then tokenization and vectorization using Text Rank algorithm