DSpace Repository

A Novel Approach to Manage LSA's Sytactical Blindness Problem (T-0722) (MFN 5883)

Show simple item record

dc.contributor.author Mohsin Hassan Khan, 01-244151-039
dc.date.accessioned 2017-08-02T06:50:55Z
dc.date.available 2017-08-02T06:50:55Z
dc.date.issued 2017
dc.identifier.uri http://hdl.handle.net/123456789/3514
dc.description Supervised by Dr. Raja M. Suleman en_US
dc.description.abstract Natural language processing (NLP) is a computerized technique that is used for analyzing and representing human language automatically. NLP has been employed in many applications such as information retrieval, information processing, translations of language, automated answer grading and many more. The main problem with NLP is high level of uncertainty in natural language. High uncertainty in natural language makes automated analyses and extraction of useful information very difficult. Several approaches have been developed for automated grading. Latent Sematic Analysis (LSA) is one of the widely used approaches for automated text matching. LSA is a corpus based approach that evaluates similarity on the basis of semantic relations among words and ignores the structural composition of sentence. The structure blindness of LSA treats a logically wrong answer as a correct answer. LSA cannot recognize sentences that are semantically related but inverse of each other [8]. Furthermore, LSA cannot handle “gaming the system”, where user provides only the list of keywords without proper sentence structure. The target of our research is to develop an algorithm Extended Latent Sematic Analysis (xLSA) which focuses on synthetic composition of a sentence and overcome LSA’s syntactic blindness problem. xLSA examine sentences and identifies that proper sentence structure exists to cater “gaming the system” problem. xLSA analyzes text inputs to recognize their dependency structure and then decompose each sentence to identify subject, verb and object. Sentences are then compared and an approximation of synthetic and semantic space is generated for similar texts. xLSA compute semantic similarity score of two sentences and also identifies inverse sentences, negative sentences and “gaming the system”. We have tested xLSA with 200 semantically similar sentences from two corpuses [28] [29]. Results show xLSA outperforms then traditional LSA and identifies inverse sentences, negative sentence and list of keywords without having proper sentence structure. en_US
dc.language.iso en en_US
dc.publisher Software Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries MS SE;T-0722
dc.subject Software Engineering en_US
dc.title A Novel Approach to Manage LSA's Sytactical Blindness Problem (T-0722) (MFN 5883) en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account