Blogs (1) >>
ICSE 2019
Sat 25 - Fri 31 May 2019 Montreal, QC, Canada
Wed 29 May 2019 11:00 - 11:20 at Place du Canada - Mining of Software Properties and Patterns Chair(s): Julia Rubin

Recent works have concluded that software is more repetitive and predictable, i.e. more natural, than English texts. These works included “simple/artificial” syntax rules in their language models. When we remove SyntaxTokens we find that code is still repetitive and predictable but only at levels slightly above English. Furthermore, previous works have compared individual Java programs to general English corpora, such as Gutenberg, which contains a historically large range of styles and subjects (e.g. Saint Augustine to Oscar Wilde). We perform an additional comparison of technical StackOverflow English discussions with source code and find that this restricted English is similarly repetitive to code. Although we find that code is less repetitive than previously thought, we suspect that API code element usage will be repetitive across software projects. For example a file is opened and closed in the same manner irrespective of domain. When we restrict our n-grams to those contained in the Java API we find that the entropy is significantly lower than the English corpora. Previous works have focused on sequential sequences of tokens. When we extract program graphs of size 2, 3, and 4 nodes we see that the abstract graph representation is much more concise and repetitive than the sequential representations of the same code. This suggests that future work should focus on statistical graph models that go beyond linear sequences of tokens. Our anonymous replication package makes our scripts and data available to future researchers and reviewers.

Wed 29 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
Mining of Software Properties and PatternsTechnical Track / Journal-First Papers / Papers at Place du Canada
Chair(s): Julia Rubin University of British Columbia
11:00
20m
Talk
Natural Software RevisitedTechnical Track
Technical Track
Musfiqur Rahman Concordia University, Montreal, Canada, Dharani Palani Concordia University, Peter Rigby Concordia University, Montreal, Canada
11:20
20m
Talk
Towards Automating Precision Studies of Clone DetectorsACM SIGSOFT Distinguished Artifact AwardArtifacts AvailableArtifacts Evaluated ReusableTechnical Track
Technical Track
Vaibhav Saini Microsoft, USA, Farima Farmahinifarahani University of California at Irvine, USA, Yadong Lu University of California at Irvine, USA, Di Yang University of California at Irvine, USA, Pedro Martins University of California at Irvine, USA, Hitesh Sajnani Microsoft , Pierre Baldi University of California at Irvine, USA, Crista Lopes
11:40
10m
Talk
Will This Clone be Short-lived?Towards a Better Understanding of the Characteristics of Short-lived ClonesJournal-First
Journal-First Papers
Patanamon Thongtanunam The University of Melbourne, Weiyi Shang Concordia University, Canada, Ahmed E. Hassan Queen's University
11:50
10m
Talk
A systematic literature review on bad smells - 5 W's: which, when, what, who, whereJournal-First
Journal-First Papers
Elder Vicente De Paulo Sobrinho Federal University of Triangulo Mineiro, Andrea De Lucia University of Salerno, Marcelo De Almeida Maia Federal University of Uberlandia
12:00
10m
Talk
Beyond Technical Aspects: How Do Community Smells Influence the Intensity of Code Smells?Journal-First
Journal-First Papers
Fabio Palomba University of Zurich, Damian Andrew Tamburri TU/e, Francesca Arcelli Fontana University of Milano-Bicocca, Rocco Oliveto University of Molise, Andy Zaidman TU Delft, Alexander Serebrenik Eindhoven University of Technology
Pre-print
12:10
10m
Talk
On the Nature of Merge Conflicts: a Study of 2,731 Open Source Java Projects Hosted by GitHubJournal-First
Journal-First Papers
Gleiph Ghiotto UFJF, Leonardo Murta Universidade Federal Fluminense (UFF), Marcio Barros UNIRIO, Andre van der Hoek University of California, Irvine
Pre-print
12:20
10m
Talk
Discussion Period
Papers