Paragraph theme suggestions

Huzaifa Sajjad Malik, 01-235161-015; Fahim Ullah Khan, 01-235161-008

DSpace Home
→
Final Year Project Report (BUIC)
→
Department of Computer Science & IT (BUIC)
→
BS-IT (IC-FYP)
→
View Item

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author	Huzaifa Sajjad Malik, 01-235161-015
dc.contributor.author	Fahim Ullah Khan, 01-235161-008
dc.date.accessioned	2021-01-16T03:07:02Z
dc.date.available	2021-01-16T03:07:02Z
dc.date.issued	2020
dc.identifier.uri	http://hdl.handle.net/123456789/10793
dc.description	Supervised by Dr. Muhammad Asfand-e-Yar	en_US
dc.description.abstract	In today’s world, people influence, communicate and express their thoughts to others via emails, letters and blog posts to name a few but in todays modern world people have no time to go through a whole article to understand the main point of the document. In order to read a document we first analyze its title if the title is meaningful and attractive, we will open up that article, email or blog. Suppose that there is a really great article or blog on financial management but the title of that post is not at all convincing so no one will bother to open that article which is an insult to the content of that article. In order to solve the above described problem we have come up with three different alterations of the basic bag of words model. The first model which is used to generate the keywords from a given document tokenizes the document using spacy. The most repeated words in the documents are considered as keywords. The second model which is used to generate summary takes this list of keywords and divides each keywords frequency with the maximum frequency and it then ranks each sentence based on the presenece of these keywords in that particluar sentence. Lastly, we used another bag of words model which extract the subject from a given document. This POS(Parts of Speech) bag of words stores the frequency of each parts of speech word used in the document after tokenizing it using spacy. Once all the POS words and their frequencies are stored in their each respective dictionary we construct a sentence by using these POS words which acts as the subject or title of the document. Our experimental results show that the system is generating satisfying results which can be further improved by either using TF-IDF to extract the keywords from the document or the usage of LDA to extract the topic of the document by using a large dataset and classification classes which can than rank the document to its closest topic.	en_US
dc.language.iso	en	en_US
dc.publisher	Computer Sciences BUIC	en_US
dc.relation.ispartofseries	BS (IT);MFN-P 9053
dc.subject	Paragraph Theme	en_US
dc.title	Paragraph theme suggestions	en_US
dc.type	Project Reports	en_US