BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and FixesTechnical Track
Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like Travis-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BugSwarm, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BugSwarm toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.
Wed 29 May Times are displayed in time zone: Eastern Time (US & Canada) change
16:00 - 18:00: SE Datasets, Research Infrastructure, and MethodologyPapers / Journal-First Papers / New Ideas and Emerging Results / Demonstrations / Technical Track at Viger Chair(s): Rashina HodaThe University of Auckland | |||
16:00 - 16:20 Talk | BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and FixesTechnical Track Technical Track Naji DmeiriUniversity of California, Davis, David A TomassiUniversity of California, Davis, Yichen WangUniversity of California, Davis, Antara BhowmickUniversity of California, Davis, Yen-Chuan LiuUniversity of California, Davis, Prem DevanbuUniversity of California, Bogdan VasilescuCarnegie Mellon University, Cindy Rubio-GonzálezUniversity of California, Davis Pre-print | ||
16:20 - 16:40 Talk | DefeXts: A Curated Dataset of Reproducible Real-World Bugs for Modern JVM LanguagesDemos Demonstrations Samuel BentonThe University of Texas at Dallas, Ali GhanbariThe University of Texas at Dallas, Lingming Zhang | ||
16:40 - 16:50 Talk | Open Collaborative Data – using OSS principles to share data in SW engineeringNIER New Ideas and Emerging Results Per RunesonLund University | ||
16:50 - 17:00 Talk | Leveraging Small Software Engineering Data Sets with Pre-trained Neural NetworksNIER New Ideas and Emerging Results | ||
17:00 - 17:20 Talk | ActionNet: Vision-based Workflow Action Recognition From Programming ScreencastsTechnical Track Technical Track Dehai Zhao, Zhenchang XingAustralia National University, Chunyang ChenMonash University, Xin XiaMonash University, Guoqiang LiShanghai Jiao Tong University | ||
17:20 - 17:30 Talk | The ABC of Software Engineering ResearchJournal-First Journal-First Papers Klaas-Jan StolUniversity College Cork and Lero, Ireland, Brian FitzgeraldLero - The Irish Software Research Centre and University of Limerick Link to publication DOI | ||
17:30 - 17:40 Talk | Mining Plausible Hypotheses from the Literature via Meta-AnalysisNIER New Ideas and Emerging Results Vladimir Ivanov, Giancarlo SucciInnopolis University, Jooyong YiUNIST (Ulsan National Institute of Science and Technology) | ||
17:40 - 17:50 Talk | Analyzing Families of Experiments in SE: a Systematic Mapping StudyJournal-First Journal-First Papers Adrian Santos Parrilla, Omar GomezEscuela Superior Politecnica de Chimborazo Riobamba, Natalia JuristoUniversidad Politecnica de Madrid | ||
17:50 - 18:00 Talk | Discussion Period Papers |