Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
dc.contributor.author | Huzaifa Sajjad Malik, 01-235161-015 | |
dc.contributor.author | Fahim Ullah Khan, 01-235161-008 | |
dc.date.accessioned | 2021-01-16T03:07:02Z | |
dc.date.available | 2021-01-16T03:07:02Z | |
dc.date.issued | 2020 | |
dc.identifier.uri | http://hdl.handle.net/123456789/10793 | |
dc.description | Supervised by Dr. Muhammad Asfand-e-Yar | en_US |
dc.description.abstract | In today’s world, people influence, communicate and express their thoughts to others via emails, letters and blog posts to name a few but in todays modern world people have no time to go through a whole article to understand the main point of the document. In order to read a document we first analyze its title if the title is meaningful and attractive, we will open up that article, email or blog. Suppose that there is a really great article or blog on financial management but the title of that post is not at all convincing so no one will bother to open that article which is an insult to the content of that article. In order to solve the above described problem we have come up with three different alterations of the basic bag of words model. The first model which is used to generate the keywords from a given document tokenizes the document using spacy. The most repeated words in the documents are considered as keywords. The second model which is used to generate summary takes this list of keywords and divides each keywords frequency with the maximum frequency and it then ranks each sentence based on the presenece of these keywords in that particluar sentence. Lastly, we used another bag of words model which extract the subject from a given document. This POS(Parts of Speech) bag of words stores the frequency of each parts of speech word used in the document after tokenizing it using spacy. Once all the POS words and their frequencies are stored in their each respective dictionary we construct a sentence by using these POS words which acts as the subject or title of the document. Our experimental results show that the system is generating satisfying results which can be further improved by either using TF-IDF to extract the keywords from the document or the usage of LDA to extract the topic of the document by using a large dataset and classification classes which can than rank the document to its closest topic. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Computer Sciences BUIC | en_US |
dc.relation.ispartofseries | BS (IT);MFN-P 9053 | |
dc.subject | Paragraph Theme | en_US |
dc.title | Paragraph theme suggestions | en_US |
dc.type | Project Reports | en_US |