Analysis and Detection of Information Types of Open Source Software Issue Discussions (ICSE 2019 - Technical Track) - International Conference on Software Engineering 2019 in Montreal, Canada

Blogs (1) >>

Sat 25 - Fri 31 May 2019 Montreal, QC, Canada

Who

Deeksha M. Arya, Cheryl Wang, Jin L.C. Guo, Jinghui Cheng

Track

ICSE 2019 Technical Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 30 May 2019 11:30 - 11:50 at Place du Canada - Software Analytics Chair(s): Christian Bird

Abstract

Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Overtime, these comments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task. In this paper, we address this challenge by identifying the information types presented in comments of OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy diverse needs of OSS stakeholders.

Link to Preprint

https://arxiv.org/abs/1902.07093

DOI

https://doi.org/10.5281/zenodo.2577268

Deeksha M. Arya

McGill University

Cheryl Wang

McGill University

Jin L.C. Guo

McGill University

Jinghui Cheng

Polytechnique Montreal

Canada

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 30 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Software AnalyticsJournal-First Papers / Software Engineering in Practice / New Ideas and Emerging Results / Papers / Technical Track at Place du Canada Chair(s): Christian Bird Microsoft Research

11:00 30m Talk		(SEIP Talk) Take Control: (On the Unreasonable Effectiveness of Software Analytics)SEIPIndustry Program Software Engineering in Practice Tim Menzies North Carolina State University
11:30 20m Talk		Analysis and Detection of Information Types of Open Source Software Issue DiscussionsTechnical Track Technical Track Deeksha M. Arya McGill University, Cheryl Wang McGill University, Jin L.C. Guo McGill University, Jinghui Cheng Polytechnique Montreal DOI Pre-print
11:50 10m Talk		Automating Intention MiningJournal-First Journal-First Papers Qiao Huang , Xin Xia Monash University, David Lo Singapore Management University, Gail Murphy University of British Columbia
12:00 10m Talk		Leveraging Historical Associations between Requirements and Source Code to Identify Impacted ClassesJournal-First Journal-First Papers Davide Falessi California Polytechnic State University, Justin Roll Cal Poly, USA, Jin L.C. Guo McGill University, Jane Cleland-Huang University of Notre Dame
12:10 10m Talk		Towards Predicting the Impact of Software Changes on Building ActivitiesNIER New Ideas and Emerging Results Michele Tufano College of William and Mary, Hitesh Sajnani Microsoft , Kim Herzig Tools for Software Engineers, Microsoft Pre-print
12:20 10m Talk		Discussion Period Papers