The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models (ICSE 2019 - Journal-First Papers) - International Conference on Software Engineering 2019 in Montreal, Canada

Blogs (1) >>

Sat 25 - Fri 31 May 2019 Montreal, QC, Canada

Who

Kla Tantithamthavorn, Ahmed E. Hassan, Kenichi Matsumoto

Track

ICSE 2019 Journal-First Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 31 May 2019 11:50 - 12:00 at Laurier - Defect Prediction Chair(s): Burak Turhan

Abstract

Defect models that are trained on class imbalanced datasets (i.e., the proportion of defective and clean modules is not equally represented) are highly susceptible to produce inaccurate prediction models. Prior research compares the impact of class rebalancing techniques on the performance of defect models but arrives at contradictory conclusions due to the use of different choice of datasets, classification techniques, and performance measures. Such contradictory conclusions make it hard to derive practical guidelines for whether class rebalancing techniques should be applied in the context of defect models. In this paper, we investigate the impact of class rebalancing techniques on performance measures and the interpretation of defect models. We also investigate the experimental settings in which class rebalancing techniques are beneficial for defect models. Through a case study of 101 datasets that span across proprietary and open-source systems, we conclude that the impact of class rebalancing techniques on the performance of defect prediction models depends on the used performance measure and the used classification techniques. We observe that the optimized SMOTE technique and the under-sampling technique are beneficial when quality assurance teams wish to increase AUC and Recall, respectively, but they should be avoided when deriving knowledge and understandings from defect models.

Link to Preprint

http://chakkrit.com/assets/papers/tantithamthavorn2018imbalance.pdf

Kla Tantithamthavorn

Monash University, Australia

Australia

Ahmed E. Hassan

Queen's University

Canada

Kenichi Matsumoto

Nara Institute of Science and Technology