A Comparative Analysis of Techniques, Datasets, Feature Selection Methods, and Evaluation Metrics in Software Fault Prediction
Rajinder Kumar1, Kamaljit Kaur2
1Rajinder Kumar, Research Scholar, Department of Computer Science and Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib. (Punjab), India and Assistant Professor, Department of Computer Applications, Chandigarh Business School of Administration, Landran, Mohali (Punjab), India.
2Dr. Kamaljit Kaur, Assistant Professor, Department of Computer Science, Sri Guru Granth Sahib World University, Fatehgarh Sahib (Punjab), India.
Manuscript received on 12 June 2025 | First Revised Manuscript received on 05 July 2025 | Second Revised Manuscript received on 10 July 2025 | Manuscript Accepted on 15 July 2025 | Manuscript published on 30 July 2025 | PP: 25-41 | Volume-13 Issue-8, July 2025 | Retrieval Number: 100.1/ijese.B828014020725 | DOI: 10.35940/ijese.B8280.13080725
Open Access | Editorial and Publishing Policies | Cite | Zenodo | OJS | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: This study presents a systematic literature review (SLR) that investigates recent advancements in Software Fault Prediction (SFP) methodologies. The review focuses on key dimensions including techniques, datasets, feature selection methods, software metrics, and evaluation criteria. By analyzing significant studies from renowned digital libraries such as ACM, IEEE, Springer Link, and Science Direct, five research questions were defined to guide the assessment of current trends in SFP research. Findings reveal that machine learning approaches— particularly neural networks, deep learning, and ensemble methods—are increasingly employed due to their capability to manage the complexity of software fault data. Public datasets, notably those from the PROMISE and NASA MDP repositories, are widely utilized, underlining the importance of dataset diversity for enhancing model performance. Feature selection methods, particularly wrapper techniques, are often employed to improve predictive accuracy. Evaluation of models predominantly relies on confusion matrix-based metrics such as Accuracy, Precision, Recall, and F1-Score. Despite these advances, challenges remain in addressing class imbalance, adapting to rapidly evolving software environments, and achieving real-time fault prediction. The study highlights the need for greater classifier diversity and ongoing methodological improvements to enhance the robustness and generalizability of SFP models.
Keywords: Software Fault Prediction; Feature Selection Techniques; Software Metrics; Public Datasets; Confusion Matrix-Based; Class Imbalance.
Scope of the Article: Software Engineering and Applications