ActionNet: Vision-based Workflow Action Recognition From Programming Screencasts
Programming screencasts have two important applications in software engineering context: study developer behaviors and information needs and disseminate software engineering knowledge. Although programming screencasts are easy to produce, they are not easy to analyze or index due to the image nature of the data. Existing techniques extract only content from screencasts, but ignore workﬂow actions by which developers accomplish programming tasks. This signiﬁcantly limits the effective use of programming screencasts in downstream applications. In this paper, we present the ﬁrst technique for recognizing workﬂow actions in programming screencasts. Our technique exploits image differencing and Convolutional Neural Network (CNN) to analyze the correspondence and change of consecutive frames, based on which nine classes of frequent developer actions can be recognized from programming screencasts. Using programming screencasts from Youtube, we evaluate different conﬁgurations of our CNN model and the performance of our technique for developer action recognition across developers, working environments and programming languages. Using screencasts of developers’ real work, we demonstrate the usefulness of our technique in a practical application for action-aware extraction of key-code frames in developers’ work.