DETECTING DISEASE OUTBREAKS BY ANALYZING TWEETS

Nehal, Hiba Reg # 43697; Zaidi, Sarmad Hussain Reg # 43789; Siddiqui, Shaheryar Ali Reg # 43791

DSpace Home
→
Thesis/Dissertation Repository Karachi Campus
→
Department of Computer Science (BUKC)
→
BS-CS (BUKC)
→
View Item

DETECTING DISEASE OUTBREAKS BY ANALYZING TWEETS

Nehal, Hiba Reg # 43697; Zaidi, Sarmad Hussain Reg # 43789; Siddiqui, Shaheryar Ali Reg # 43791

URI: http://hdl.handle.net/123456789/15522

Date: 2019

Abstract:

Analysing Twitter user data, more specifically public messages or tweets can be very useful in monitoring diseases and infections worldwide. Diseases, identified by specific symptoms and signs and classified as medical conditions, if left unchecked, can cause a lot of damage to an area’s resources and populace. Not to mention, keeping people aware of potential dangers or epidemics significantly reduces widespread hysteria and prepares them for the worst during times of crisis. Detecting disease outbreaks before they occur would be incredibly useful tool for the health sector and its even possible in this time, where people post their problems on social media and through text messages causing a massive amount of data to be transmitted on a daily basis. First, our system collects health-related tweets using Twitter API and filters, cleans and tags those tweets to create a functional, usable dataset. Essentially, this means that only tweets mentioning diseases and containing proper locations are added to our dataset. We used both SVM and Naive Bayes algorithm for data classification and tagging which resulted in accuracy rates of 82% and 75% respectively. The TF-IDF vectorizer was used for feature extraction in both of these algorithms. For map plotting and visualization, we used a simple HTML/CSS page with JavaScript to show our findings on a map and make the results more readable and interactive. We decided to use Flask, a third-party Python library for extensible web microframework to achieve our final goal. Our research shows that Twitter data has many applications for public health research.