CODE SMELLS DETECTION USING DEEP LEARNING METHODS

AMMARAH WAHEED, 01-241211-001

DSpace Home
→
Thesis/Dissertation Repository Engineering School Islamabad
→
Department of Software Engineering (BUES)
→
MS(SE) (BUES)
→
View Item

CODE SMELLS DETECTION USING DEEP LEARNING METHODS

AMMARAH WAHEED, 01-241211-001

URI: http://hdl.handle.net/123456789/14727

Date: 2022

Abstract:

Code smells can degrade software quality over time and the probability of change proneness or fault proneness is higher in the software having code smells as compared to software having no code smells. If the code smells are not perceived in the initial phases of software development, the effort required to remove issues caused by them grows rapidly. Many code smells are found in literature, and the detection of these code smells is not easy. Due to this, numerous methods for detecting these design defects are studied and proposed previously. Several automated approaches based on machine learning and deep learning have been implemented to detect code smells which improve software quality. These code smell detections models consider limited number of smells and classify code smells into binary classes. This thesis proposes a multi-class classification-based code smell detection system considering considerable code smells to overcome these issues. The proposed system detects code smells by analyzing the code metrics. The system is designed with ensemble machine learning and deep learning algorithms with the determination of improving performance. Our system is designed in two stages: pre-processing and processing. The pre-processing step consists of dataset collection, dataset cleaning, transformation, label encoding and one hot encoding. To experimentally evaluate our system, we use Fontana et al. publicly available dataset with extracted metrics of Qualitus Corpus of software systems. The processing step comprises of implementing classifiers and evaluating the results. In particular, we implement two ensemble machine learning classifiers which include Decision Tree, Random Forest, Support Vector Machine, Naïve Bayes and Logistic Regression. We also implement deep learning classifier, feed features as an input and analyze the results. We perform multi-class-classification of code smells and evaluate results using multiple evaluation measures. Besides, the results of best performing model are cross-validated using k folds cross-validation. Our system can detect six code smells: Long Method, Feature Envy, Long Parameter List and Switch Statement at method level, God Class and Data Class at class level. The comparative analysis of experimental results demonstrates that Artificial Neural Network achieves highest score of 99.57% accuracy at method level and 98.77% accuracy at class level.