DSpace Repository

Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles.

Show simple item record

dc.contributor.author Sameea Naeem, 01-249201-014
dc.date.accessioned 2022-08-04T05:25:46Z
dc.date.available 2022-08-04T05:25:46Z
dc.date.issued 2022
dc.identifier.uri http://hdl.handle.net/123456789/13008
dc.description Supervised by Dr. Arif ur Rahman en_US
dc.description.abstract Finding similarities between two inter-language news articles is a challenging problem of Natural Language Processing (NLP). All the major human activities become news and uploaded to the different news platforms in so many different languages. It is difficult to find similar news articles in a different language other than the native language of the person, there is a need for an automatic system that can estimate the similarity between two inter-language news articles. Automatic detection of similarity between two news articles is a difficult task, however, the use of machine learning techniques along with English-Urdu transliterated words can make it easier. For this purpose research propose ML model with the combination of English Urdu word transliteration which will show whether the English news article is similar to the Urdu news article or not. The existing approaches to find similarities has a major drawback when the archives contain articles of low-resourced languages like Urdu along with English news article. The existing approaches to find similarities has drawback when the archives contain low-resourced languages like Urdu along with English news articles. This research uses lexicon to link Urdu and English news articles. A literature review shows that very few researchers worked on Urdu and English news articles so first thing is to make Urdu- English lexicon or dictionary. Second thing is to process Urdu text data as it’s difficult to convert it into word segments so the second thing done is Urdu text tokenization. The main focus of this research is the Urdu-English transliteration system. As Urdu language processing applications like machine translation, text to speech, etc are unable to handle English text at the same time so this research proposed technique to find similarities in English and Urdu news articles based on transliteration. en_US
dc.language.iso en en_US
dc.publisher Computer Sciences BUIC en_US
dc.relation.ispartofseries MS (DS);T-10572
dc.subject Natural Language Processing en_US
dc.subject Human Activities en_US
dc.title Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles. en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account