Parts-of-speech tagger (post) for Urdu language Using semi-supervised machine learning model (T-0738) (MFN 6885)

Kinza Faisal Jamal, 01-241161-008

DSpace Home
→
Thesis/Dissertation Repository Engineering School Islamabad
→
Department of Software Engineering (BUES)
→
MS(SE) (BUES)
→
View Item

dc.contributor.author	Kinza Faisal Jamal, 01-241161-008
dc.date.accessioned	2018-08-29T07:47:10Z
dc.date.available	2018-08-29T07:47:10Z
dc.date.issued	2018
dc.identifier.uri	http://hdl.handle.net/123456789/7374
dc.description	Supervised by Dr. Raja Muhammad Suleman	en_US
dc.description.abstract	atural Language Processing (NLP) is the study of interaction between human and machine through natural language. Natural language is extremely rich in form, structure and ambiguity. There are many ways introduced in NLP to resolve ambiguity. One of its primary method is Parts of Speech (POS) tagging. POS Tagger is a software that is used to tag words to their respective parts of speech tags. A lot of work has been done in POS tagging for English and European Languages but Urdu language has limited POS taggers and resources. The current POS taggers for Urdu have multiple issues with them such as; dependence on lexical databases, missing contextual prediction of the word. Moreover, all POS taggers for Urdu language have been built using Supervised Machine Learning which depends on the availability of completely annotated datasets. This research, proposes a Semi-Supervised Machine Learning model-based Parts of Speech Tagger (POST) that is not dependent on lexical database or completely annotated corpus by using a partially annotated corpus, to train the model. The Model we used is known as Maximum Entropy Markov Model (MEMM). The model gives promising results with an accuracy of ~93%, which is significant when compared to the results of the existing POS taggers for Urdu that employ a Supervised Machine Learning approach.	en_US
dc.language.iso	en	en_US
dc.publisher	Software Engineering, Bahria University Engineering School Islamabad	en_US
dc.relation.ispartofseries	MS SE;T-0738
dc.subject	Software Engineering	en_US
dc.title	Parts-of-speech tagger (post) for Urdu language Using semi-supervised machine learning model (T-0738) (MFN 6885)	en_US
dc.type	MS Thesis	en_US