Abstract:
Text summarization refers to the technique of shortening long pieces of text. With
such a big amount of data circulating in the digital space, there is a need to develop
machine learning technique that can automatically shorten longer texts and deliver
accurate summaries that can fluently pass the intended messages. The intention is to
create a coherent and fluent summary having only the main points outlined in the
document. Machine learning and natural language processing (NLP) will be used to
automate text summarization. The objective of this project is to develop Text
Summarizer using Machine Learning. It can overcome the grammar inconsistencies
ofthe extractive method.
Text summarizer will convert the paragraph into sentences. First, Text summarizer
split the paragraph into its corresponding sentences, then Text will be processed, the
next step will be Tokenization. It will evaluate the weighted occurrence frequency of
the words, and then it will substitute words with their weighted frequencies. All of
the work will be done using R tool.
The main advantage of using this technique is that it provides the source text into a
shorter version with semantics, it reduces the reading time, it expresses the main
intent of the given document. Text summarization takes care of choosing the most
significant portions of text and generates coherent summaries that express the main
intent ofthe given document. Extraction based text summarization involves selecting
sentences of high relevance (rank) from the document. The abstractive text
summarization algorithms create new phrases and sentences that relay the most
useful information from the original text just like humans do. The system first
proceeds with the pre-processing of the given text, then tokenization and
vectorization using Text Rank algorithm