Abstract:
Code smells are the structural characteristics of the software under development that
indicate poor design or code choices that have the potential to cause an error or failure in
the software such that it makes the software difficult to evolve and maintain. The objective
of this study is to assure the quality of the software by predicting the probability of the
existence of faults by considering software metrics, code smells and code smell metrics in
the software. We consider three types of code smells-based datasets that include code
smells only, code smells and metrics, or code smell metrics, and metrics. We label the
unlabeled datasets using clustering and pseudo-labeling techniques. We implement models
considering ensemble methods and deep learning algorithms. We perform ten experiments
and compare the performance of these code smells-based datasets. We perform binary
classification of faults and results are evaluated using multiple evaluation measures.
Besides, the results of models are cross-validated using k-fold cross-validation. We use
statistical tests to observe the significance of the model. The comparative analysis of
experimental results demonstrates that the ensemble method and deep learning approach
using code smells and metric dataset is effective for code smell-based defect prediction.
The results obtained from our models outperform the state-of-the-art approaches. Datasets
including code smells and metric perform better than other smell related datasets. We
conclude that code smells-based software defects prediction has optimal accuracy and
precision.