A Novel Approach to Manage LSA's Sytactical Blindness Problem (T-0722) (MFN 5883)

Mohsin Hassan Khan, 01-244151-039

DSpace Home
→
Thesis/Dissertation Repository Engineering School Islamabad
→
Department of Software Engineering (BUES)
→
MS(SE) (BUES)
→
View Item

dc.contributor.author	Mohsin Hassan Khan, 01-244151-039
dc.date.accessioned	2017-08-02T06:50:55Z
dc.date.available	2017-08-02T06:50:55Z
dc.date.issued	2017
dc.identifier.uri	http://hdl.handle.net/123456789/3514
dc.description	Supervised by Dr. Raja M. Suleman	en_US
dc.description.abstract	Natural language processing (NLP) is a computerized technique that is used for analyzing and representing human language automatically. NLP has been employed in many applications such as information retrieval, information processing, translations of language, automated answer grading and many more. The main problem with NLP is high level of uncertainty in natural language. High uncertainty in natural language makes automated analyses and extraction of useful information very difficult. Several approaches have been developed for automated grading. Latent Sematic Analysis (LSA) is one of the widely used approaches for automated text matching. LSA is a corpus based approach that evaluates similarity on the basis of semantic relations among words and ignores the structural composition of sentence. The structure blindness of LSA treats a logically wrong answer as a correct answer. LSA cannot recognize sentences that are semantically related but inverse of each other [8]. Furthermore, LSA cannot handle “gaming the system”, where user provides only the list of keywords without proper sentence structure. The target of our research is to develop an algorithm Extended Latent Sematic Analysis (xLSA) which focuses on synthetic composition of a sentence and overcome LSA’s syntactic blindness problem. xLSA examine sentences and identifies that proper sentence structure exists to cater “gaming the system” problem. xLSA analyzes text inputs to recognize their dependency structure and then decompose each sentence to identify subject, verb and object. Sentences are then compared and an approximation of synthetic and semantic space is generated for similar texts. xLSA compute semantic similarity score of two sentences and also identifies inverse sentences, negative sentences and “gaming the system”. We have tested xLSA with 200 semantically similar sentences from two corpuses [28] [29]. Results show xLSA outperforms then traditional LSA and identifies inverse sentences, negative sentence and list of keywords without having proper sentence structure.	en_US
dc.language.iso	en	en_US
dc.publisher	Software Engineering, Bahria University Engineering School Islamabad	en_US
dc.relation.ispartofseries	MS SE;T-0722
dc.subject	Software Engineering	en_US
dc.title	A Novel Approach to Manage LSA's Sytactical Blindness Problem (T-0722) (MFN 5883)	en_US
dc.type	MS Thesis	en_US