WEB APPLICATION VULNERABILITY ANALYSIS USING MACHINE LEARNING

Marrium Mehmood, 01-241201-009

DSpace Home
→
Thesis/Dissertation Repository Engineering School Islamabad
→
Department of Software Engineering (BUES)
→
MS(SE) (BUES)
→
View Item

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author	Marrium Mehmood, 01-241201-009
dc.date.accessioned	2022-12-20T08:18:11Z
dc.date.available	2022-12-20T08:18:11Z
dc.date.issued	2022
dc.identifier.uri	http://hdl.handle.net/123456789/14455
dc.description	Supervised by Dr. Tamim Ahmed Khan	en_US
dc.description.abstract	A substantial increase in web-based applications has changed business perspectives and web applications are becoming almost essential to every business for online transactions. This has paved the way for cyber-attacks where hackers and attackers make use of online system vulnerabilities for illegal system usage. In order to protect web application, different types of detection systems are proposed based on machine learning. Most of the systems detect few and outdated attacks as this area of study lacks a good labelled dataset containing modern attacks. This thesis proposes a payload-based web attack detection considering modern attacks as stated by OWASP and NIST. The proposed system detects web attacks by analyzing the payload. Our system is designed in two stages: pre-processing and processing. The pre-processing step consists of dataset creation, feature extraction and feature selection. To experimentally evaluate our system, we used an additional HTTP Param publicly available dataset with our payload-based dataset. We implemented an automatic feature extraction technique to extract features from the payload with TF-IDF vectorizer to enhance the performance. Three types of n-grams: unigram, bigram and trigram are used separately and results are analyzed. We implemented four feature selection techniques: Correlation-based feature selection (CFS), mutual info, random forest importance and Principal Component Analysis (PCA) to obtain a best feature subset. The processing step comprises of implementing multiple classifiers with the purpose of comparing results and performance. In particular, we implemented four machine learning classifiers: decision trees, random forest, logistic regression and K-Nearest Neighbor and feed different feature subsets as an input and analyzed the results. We performed multiclass-classification of attacks and results are evaluated using multiple evaluation measures. Besides, the results of best performing model are cross-validated using 10 folds crossvalidation. Our system is able to detect 8 vulnerabilities: SQL injection, Cross-Site Scripting, XML external entities, Command injections, open redirect, carriage return and line field injections, path traversal, file inclusions and also normal requests. The comparative analysis of experimental results demonstrates that embedded methods for bigram extracted features under random forest classifier achieve highest score of 99.48% accuracy, 98.66% precision, 96.50% recall and 97.41% F1-Score.	en_US
dc.language.iso	en	en_US
dc.publisher	Software Engineering, Bahria University Engineering School Islamabad	en_US
dc.relation.ispartofseries	MS-SE;T-1823
dc.subject	Software Engineering	en_US
dc.title	WEB APPLICATION VULNERABILITY ANALYSIS USING MACHINE LEARNING	en_US
dc.type	MS Thesis	en_US