Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.
| dc.contributor.author | Kinza Faisal Jamal, 01-241161-008 | |
| dc.date.accessioned | 2018-08-29T07:47:10Z | |
| dc.date.available | 2018-08-29T07:47:10Z | |
| dc.date.issued | 2018 | |
| dc.identifier.uri | http://hdl.handle.net/123456789/7374 | |
| dc.description | Supervised by Dr. Raja Muhammad Suleman | en_US |
| dc.description.abstract | atural Language Processing (NLP) is the study of interaction between human and machine through natural language. Natural language is extremely rich in form, structure and ambiguity. There are many ways introduced in NLP to resolve ambiguity. One of its primary method is Parts of Speech (POS) tagging. POS Tagger is a software that is used to tag words to their respective parts of speech tags. A lot of work has been done in POS tagging for English and European Languages but Urdu language has limited POS taggers and resources. The current POS taggers for Urdu have multiple issues with them such as; dependence on lexical databases, missing contextual prediction of the word. Moreover, all POS taggers for Urdu language have been built using Supervised Machine Learning which depends on the availability of completely annotated datasets. This research, proposes a Semi-Supervised Machine Learning model-based Parts of Speech Tagger (POST) that is not dependent on lexical database or completely annotated corpus by using a partially annotated corpus, to train the model. The Model we used is known as Maximum Entropy Markov Model (MEMM). The model gives promising results with an accuracy of ~93%, which is significant when compared to the results of the existing POS taggers for Urdu that employ a Supervised Machine Learning approach. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | Software Engineering, Bahria University Engineering School Islamabad | en_US |
| dc.relation.ispartofseries | MS SE;T-0738 | |
| dc.subject | Software Engineering | en_US |
| dc.title | Parts-of-speech tagger (post) for Urdu language Using semi-supervised machine learning model (T-0738) (MFN 6885) | en_US |
| dc.type | MS Thesis | en_US |