A Novel Neural Source Code Representation based on Abstract Syntax Tree (ICSE 2019 - Technical Track) - International Conference on Software Engineering 2019 in Montreal, Canada

Blogs (1) >>

Sat 25 - Fri 31 May 2019 Montreal, QC, Canada

Who

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, Xudong Liu

Track

ICSE 2019 Technical Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 31 May 2019 11:40 - 12:00 at Place du Canada - Machine Learning in Static Analysis Chair(s): Na Meng

Abstract

Exploiting machine learning techniques for analyzing programs has attracted much attention. One key problem is how to represent code fragments well for follow-up analysis. Traditional information retrieval based methods often treat programs as natural language texts, which could miss important semantic information of source code. Recently, state-of-the-art studies demonstrate that abstract syntax tree (AST) based neural models can better represent source code. However, the sizes of ASTs are usually large and the existing models are prone to the long-term dependency problem. In this paper, we propose a novel AST-based Neural Network (ASTNN) for source code representation. Unlike existing models that work on entire ASTs, ASTNN splits each large AST into a sequence of small statement trees, and recursively encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements. Based on the sequence of statement vectors, a bidirectional RNN model is used to leverage the naturalness of statements and finally produce the vector representation of a code fragment. We have applied our neural network based source code representation method to two common program comprehension tasks: source code classification and code clone detection. Experimental results on the two tasks indicate that our model is superior to state-of-the-art approaches.

Link to Preprint

http://xuwang.tech/paper/astnn_icse2019.pdf

Jian Zhang

Beihang University

Xu Wang

Beihang University

China

Hongyu Zhang

The University of Newcastle

Australia

Hailong Sun

Beihang University

Kaixuan Wang

Beihang University

Xudong Liu

Beihang University

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 31 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Machine Learning in Static AnalysisPapers / Technical Track at Place du Canada Chair(s): Na Meng Virginia Tech

11:00 20m Talk		Training Binary Classifiers as Data Structure InvariantsTechnical Track Technical Track Facundo Molina Universidad Nacional de Rio Cuarto, Argentina, Renzo Degiovanni SnT, University of Luxembourg, Pablo Ponzio Dept. of Computer Science FCEFQyN, University of Rio Cuarto, Germán Regis Universidad Nacional de Río Cuarto, Nazareno Aguirre Dept. of Computer Science FCEFQyN, University of Rio Cuarto, Marcelo F. Frias Dept. of Software Engineering Instituto Tecnológico de Buenos Aires
11:20 20m Talk		Graph Embedding based Familial Analysis of Android Malware using Unsupervised LearningTechnical Track Technical Track Ming Fan MOEKLINNS Lab, Department of Computer Science and Technology, Xi'an Jiaotong University, 710049, China, Xiapu Luo , Jun Liu MOEKLINNS Lab, Department of Computer Science and Technology, Xi'an Jiaotong University, 710049, China, Meng Wang University of Bristol, UK, Chunyin Nong , Qinghua Zheng MOEKLINNS Lab, Department of Computer Science and Technology, Xi'an Jiaotong University, 710049, China, Ting Liu MOEKLINNS Lab, Department of Computer Science and Technology, Xi'an Jiaotong University, 710049, China
11:40 20m Talk		A Novel Neural Source Code Representation based on Abstract Syntax TreeTechnical Track Technical Track Jian Zhang Beihang University, Xu Wang Beihang University, Hongyu Zhang The University of Newcastle, Hailong Sun Beihang University, Kaixuan Wang Beihang University, Xudong Liu Beihang University Pre-print
12:00 20m Talk		A Neural Model for Generating Natural Language Summaries of Program SubroutinesTechnical Track Technical Track Alexander LeClair University Of Notre Dame, Siyuan Jiang Eastern Michigan University, Collin McMillan
12:20 10m Talk		Discussion Period Papers