BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and FixesTechnical Track
Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like Travis-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BugSwarm, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BugSwarm toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually.
Wed 29 MayDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 18:00 | SE Datasets, Research Infrastructure, and MethodologyJournal-First Papers / New Ideas and Emerging Results / Demonstrations / Papers / Technical Track at Viger Chair(s): Rashina Hoda The University of Auckland | ||
16:00 20mTalk | BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and FixesTechnical Track Technical Track Naji Dmeiri University of California, Davis, David A Tomassi University of California, Davis, Yichen Wang University of California, Davis, Antara Bhowmick University of California, Davis, Yen-Chuan Liu University of California, Davis, Prem Devanbu University of California, Bogdan Vasilescu Carnegie Mellon University, Cindy Rubio-González University of California, Davis Pre-print | ||
16:20 20mTalk | DefeXts: A Curated Dataset of Reproducible Real-World Bugs for Modern JVM LanguagesDemos Demonstrations Samuel Benton The University of Texas at Dallas, Ali Ghanbari The University of Texas at Dallas, Lingming Zhang | ||
16:40 10mTalk | Open Collaborative Data – using OSS principles to share data in SW engineeringNIER New Ideas and Emerging Results Per Runeson Lund University | ||
16:50 10mTalk | Leveraging Small Software Engineering Data Sets with Pre-trained Neural NetworksNIER New Ideas and Emerging Results | ||
17:00 20mTalk | ActionNet: Vision-based Workflow Action Recognition From Programming ScreencastsTechnical Track Technical Track Dehai Zhao , Zhenchang Xing Australia National University, Chunyang Chen Monash University, Xin Xia Monash University, Guoqiang Li Shanghai Jiao Tong University | ||
17:20 10mTalk | The ABC of Software Engineering ResearchJournal-First Journal-First Papers Klaas-Jan Stol University College Cork and Lero, Ireland, Brian Fitzgerald Lero - The Irish Software Research Centre and University of Limerick Link to publication DOI | ||
17:30 10mTalk | Mining Plausible Hypotheses from the Literature via Meta-AnalysisNIER New Ideas and Emerging Results Vladimir Ivanov , Giancarlo Succi Innopolis University, Jooyong Yi UNIST (Ulsan National Institute of Science and Technology) | ||
17:40 10mTalk | Analyzing Families of Experiments in SE: a Systematic Mapping StudyJournal-First Journal-First Papers Adrian Santos Parrilla , Omar Gomez Escuela Superior Politecnica de Chimborazo Riobamba, Natalia Juristo Universidad Politecnica de Madrid | ||
17:50 10mTalk | Discussion Period Papers |