WEB APPLICATION VULNERABILITY ANALYSIS USING MACHINE LEARNING

Welcome to DSpace BU Repository

Welcome to the Bahria University DSpace digital repository. DSpace is a digital service that collects, preserves, and distributes digital material. Repositories are important tools for preserving an organization's legacy; they facilitate digital preservation and scholarly communication.

Show simple item record

dc.contributor.author Marrium Mehmood, 01-241201-009
dc.date.accessioned 2022-12-20T08:18:11Z
dc.date.available 2022-12-20T08:18:11Z
dc.date.issued 2022
dc.identifier.uri http://hdl.handle.net/123456789/14455
dc.description Supervised by Dr. Tamim Ahmed Khan en_US
dc.description.abstract A substantial increase in web-based applications has changed business perspectives and web applications are becoming almost essential to every business for online transactions. This has paved the way for cyber-attacks where hackers and attackers make use of online system vulnerabilities for illegal system usage. In order to protect web application, different types of detection systems are proposed based on machine learning. Most of the systems detect few and outdated attacks as this area of study lacks a good labelled dataset containing modern attacks. This thesis proposes a payload-based web attack detection considering modern attacks as stated by OWASP and NIST. The proposed system detects web attacks by analyzing the payload. Our system is designed in two stages: pre-processing and processing. The pre-processing step consists of dataset creation, feature extraction and feature selection. To experimentally evaluate our system, we used an additional HTTP Param publicly available dataset with our payload-based dataset. We implemented an automatic feature extraction technique to extract features from the payload with TF-IDF vectorizer to enhance the performance. Three types of n-grams: unigram, bigram and trigram are used separately and results are analyzed. We implemented four feature selection techniques: Correlation-based feature selection (CFS), mutual info, random forest importance and Principal Component Analysis (PCA) to obtain a best feature subset. The processing step comprises of implementing multiple classifiers with the purpose of comparing results and performance. In particular, we implemented four machine learning classifiers: decision trees, random forest, logistic regression and K-Nearest Neighbor and feed different feature subsets as an input and analyzed the results. We performed multiclass-classification of attacks and results are evaluated using multiple evaluation measures. Besides, the results of best performing model are cross-validated using 10 folds crossvalidation. Our system is able to detect 8 vulnerabilities: SQL injection, Cross-Site Scripting, XML external entities, Command injections, open redirect, carriage return and line field injections, path traversal, file inclusions and also normal requests. The comparative analysis of experimental results demonstrates that embedded methods for bigram extracted features under random forest classifier achieve highest score of 99.48% accuracy, 98.66% precision, 96.50% recall and 97.41% F1-Score. en_US
dc.language.iso en en_US
dc.publisher Software Engineering, Bahria University Engineering School Islamabad en_US
dc.relation.ispartofseries MS-SE;T-1823
dc.subject Software Engineering en_US
dc.title WEB APPLICATION VULNERABILITY ANALYSIS USING MACHINE LEARNING en_US
dc.type MS Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account