Paragraph theme suggestions

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Huzaifa Sajjad Malik, 01-235161-015
dc.contributor.author Fahim Ullah Khan, 01-235161-008
dc.date.accessioned 2021-01-16T03:07:02Z
dc.date.available 2021-01-16T03:07:02Z
dc.date.issued 2020
dc.identifier.uri http://hdl.handle.net/123456789/10793
dc.description Supervised by Dr. Muhammad Asfand-e-Yar en_US
dc.description.abstract In today’s world, people influence, communicate and express their thoughts to others via emails, letters and blog posts to name a few but in todays modern world people have no time to go through a whole article to understand the main point of the document. In order to read a document we first analyze its title if the title is meaningful and attractive, we will open up that article, email or blog. Suppose that there is a really great article or blog on financial management but the title of that post is not at all convincing so no one will bother to open that article which is an insult to the content of that article. In order to solve the above described problem we have come up with three different alterations of the basic bag of words model. The first model which is used to generate the keywords from a given document tokenizes the document using spacy. The most repeated words in the documents are considered as keywords. The second model which is used to generate summary takes this list of keywords and divides each keywords frequency with the maximum frequency and it then ranks each sentence based on the presenece of these keywords in that particluar sentence. Lastly, we used another bag of words model which extract the subject from a given document. This POS(Parts of Speech) bag of words stores the frequency of each parts of speech word used in the document after tokenizing it using spacy. Once all the POS words and their frequencies are stored in their each respective dictionary we construct a sentence by using these POS words which acts as the subject or title of the document. Our experimental results show that the system is generating satisfying results which can be further improved by either using TF-IDF to extract the keywords from the document or the usage of LDA to extract the topic of the document by using a large dataset and classification classes which can than rank the document to its closest topic. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences BUIC en_US
dc.relation.ispartofseries BS (IT);MFN-P 9053
dc.subject Paragraph Theme en_US
dc.title Paragraph theme suggestions en_US
dc.type Project Reports en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account