Abstract:
Web applications and networks are part of everybody's life regarding online business, education, and social networks. Web and Network security is an important research topic to secure the internet against hacking attacks. Researchers have proposed many classical machines and deep learning (DL) intrusion detection research to make web and network security efficient. The proposed solutions in most of the studies implemented old and imbalanced datasets. This study evaluates the effectiveness of deep learning model on the latest datasets that contain more attacks than previous studies and classified them into multi-classes. This proposed solution passed through four phases: first statistical analysis of datasets, pre-processing, processing, and statistical analysis of the proposed model. For statistical analysis of datasets, we have performed one sample T-test. In p-value analysis, all feature sets of datasets significantly affect the dependent variable. We dropped missing values, removed duplicates, and used feature selection to balance the datasets during pre-processing. We have implemented a correlation-based feature selection technique for better model performance. For balancing datasets, we have implemented the Synthetic Minority Over-sampling technique (SMOTE) to enhance the prediction accuracy of minority classes. In the processing step, we implemented five experiments by varying the Deep Neural Network (DNN) layers from 1 to 5. DNN4 with relu activation function, 0.001 learning rate, 0.01 dropout, and 100 epics is better than all other experiments. We have selected the parameters of the neural network
by hyperparameter tuning. Besides this, the proposed DNN model is cross-validated by implementing stratified k-fold cross-validation using ten-fold cross-validation. The proposed model is multi-classifying twenty attacks into their respective classes. The twenty-one classes are Infiltration, Backdoor, DDoS, Injection, Brute Force, DoS, Analysis, Shellcode, Ransomware, MITM, Bot, Theft, Worms, Scanning, Password, Generic, Reconnaissance, Exploits, Fuzzers, and Benign. The comparison of experimental results shows DNN 4 achieved 0.933, highest accuracy, 0.93 precision, 0.911 recall, 0.910 F1-score, and 0.997 AUC. Besides this, we have calculated the bias and variance of the model to check the underfitting and overfitting of the model. The best proposed model's biases (Training error) is 0.066, and the variance (Testing error) is 0.0725.