DefeXts: A Curated Dataset of Reproducible Real-World Bugs for Modern JVM Languages
Software engineering studies, such as bug detection, localization, repair, and prediction, often require benchmark bug datasets for their experiments. Few publicly available reproducible bug datasets exist for research consumption. Such datasets which do exist tend to be only applicable towards traditional programming languages (e.g., Defects4J for Java and CoreBench for C). Thus, the creation and widespread usage of bug datasets for the popular modern JVM (Java Virtual Machine) programming languages serve to provide a vital resource for software research. This paper introduces DefeXts, a family of extendable bug datasets currently containing child bug datasets for Kotlin (DefeKts) and Groovy (DefeGts). Each dataset contains reproducible real-world bugs and their corresponding patches scraped from real-world projects. Our introductory versions of DefeKts and DefeGts include 225 Kotlin and 301 Groovy bugs patches. As development of DefeXts continues, we aim to include other modern JVM languages, notably Scala.