When Code Completion Fails: a Case Study on Real-World CompletionsTechnical Track
Code completion is commonly used by software developers and is integrated into all major IDE’s. Good completion tools can not only save time and effort, but may even help to avoid incorrect API usage. Many proposed completion tools have shown promising results on synthetic benchmarks, but these benchmarks make no claims about the realism of the completions they test. This lack of grounding in real-world data could hinder our scientific understanding of developer needs and of the efficacy of completion models. This paper presents a case study on 15,000 code completions that were applied by 66 real developers, in which we analyze how well the recorded completions are represented in synthetic benchmarks to inform future research in this area. Our results shows substantial differences in the distributions of the completed tokens, showing non-trivial differences with synthetic data such as remarkable prevalence of intra-project API completions. When applied to the recorded completions, our models were also far less accurate than on synthetic data. Furthermore, while most of the recorded completions were applied very fast, the cases that consumed most of the developers’ time were also the ones for which the prediction performance was far lower – an effect that is invisible in synthetic benchmarks. We highlight nine issues through our investigation and offer concrete recommendations to improve the design and evaluation of future completion tools.