| dc.description.abstract |
News bulletins data typically contain multiple stories on various topics, and identifying the story boundaries is vital for search, and curation. With the exponential increase of news sources, story segmentation will allow consumers to easily access their preferred content and assist service providers in giving individualized services to their clients. Given the dynamic range of topics, smooth story transitions, and varied duration of every story, autonomous news story segmentation is a challenging task. This research proposes a novel approach to segment stories from the local news bulletins. We've focused on Urdu news and got the textual data from news bulletins. The lack of spaces between words and the inclusion of spaces inside the same word makes word segmentation in Urdu text far more complicated than in other languages. A specific delimiter separates stories, and then positive and negartive pairs are generated to input the Siamese network with data. Siamese network examines the similarity between the encoded sentences of stories. In contrast to the traditional segmentation framework, where the model is intended to learn class Jar bels, a Siamese network allows for the learning of similarities between samples from the same class and differences between samples from other classes. The Siamese Neural Network model has achieved Fl- Measure of 0.74. |
en_US |