Abstract
Background: Ensemble techniques have gained attention
in various scientific fields. Defect prediction researchers have
investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single
classifier techniques. Almost all previous work using ensemble
techniques in defect prediction rely on the majority
voting scheme for combining prediction outputs, and on
the implicit diversity among single classifiers. Aim: Investigate
whether defect prediction can be improved using an explicit
diversity technique with stacking ensemble, given the
fact that different classifiers identify different sets of defects.
Method: We used classifiers from four different families and
the weighted accuracy diversity (WAD) technique to exploit
diversity amongst classifiers. To combine individual predictions,
we used the stacking ensemble technique. We used
state-of-the-art knowledge in software defect prediction to
build our ensemble models, and tested their prediction abilities
against 8 publicly available data sets. Conclusion:
The results show performance improvement using stacking
ensembles compared to other defect prediction models. Diversity
amongst classifiers used for building ensembles is essential
to achieving these performance improvements.
in various scientific fields. Defect prediction researchers have
investigated many state-of-the-art ensemble models and concluded that in many cases these outperform standard single
classifier techniques. Almost all previous work using ensemble
techniques in defect prediction rely on the majority
voting scheme for combining prediction outputs, and on
the implicit diversity among single classifiers. Aim: Investigate
whether defect prediction can be improved using an explicit
diversity technique with stacking ensemble, given the
fact that different classifiers identify different sets of defects.
Method: We used classifiers from four different families and
the weighted accuracy diversity (WAD) technique to exploit
diversity amongst classifiers. To combine individual predictions,
we used the stacking ensemble technique. We used
state-of-the-art knowledge in software defect prediction to
build our ensemble models, and tested their prediction abilities
against 8 publicly available data sets. Conclusion:
The results show performance improvement using stacking
ensembles compared to other defect prediction models. Diversity
amongst classifiers used for building ensembles is essential
to achieving these performance improvements.
Original language | English |
---|---|
Title of host publication | Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement |
Place of Publication | New York, NY, USA |
Publisher | ACM Press |
Pages | 46:1-46:10 |
ISBN (Print) | 978-1-4503-4427-2 |
DOIs | |
Publication status | Published - 9 Sept 2016 |
Event | IEEE International Symposium on Empirical Software Engineering and Measurement - Cuidad Real, Spain Duration: 8 Sept 2016 → 9 Sept 2016 |
Publication series
Name | ESEM '16 |
---|---|
Publisher | ACM |
Conference
Conference | IEEE International Symposium on Empirical Software Engineering and Measurement |
---|---|
Country/Territory | Spain |
City | Cuidad Real |
Period | 8/09/16 → 9/09/16 |
Keywords
- Software defect prediction, diversity, ensembles of learning machines, software faults, stacking