Mining Emerging News

Muhammad Ismail Mughal, 01-134112-041; Babar Mustafa, 01-134112-009

DSpace Home
→
Final Year Project Report (BUIC)
→
Department of Computer Science and IT (BUIC-E-8)
→
BS (CS) (BUIC-FYP-E8)
→
View Item

Mining Emerging News

Muhammad Ismail Mughal, 01-134112-041; Babar Mustafa, 01-134112-009

URI: http://hdl.handle.net/123456789/962

Date: 2015

Abstract:

We live in information age. There is so much information emerging over the internet that it is next to impossible to be able to go through all of it. This project is focused on extracting “interesting” information from the web. As a first step, we assume that newspaper report the most interesting information and thus develop a system that is able to extract interesting information from the internet using the news feed from news websites. The system is fully automated and only relies on a few input parameters. System requires an RSS feed from the described resources then it extracts title of the news from the RSS feed. Next, the system removes the repeating/insignificant words from the news title and a tokenization module transforms these keywords into tokens. These tokens are combined to form sequences of items in a time-order manner. A sequence mining algorithm is applied to extract most interesting sequences and a detokenization process is able to extract the most interesting news.