Perception and Practices of Differential TestingSEIPIndustry Program
Tens of thousands engineers are contributing to Google’s codebase that spans billions of lines of code. To ensure high code quality, tremendous amount of effort has been made with new testing techniques and frameworks. However, with increasingly complex data structures and software pipelines, traditional test cases based testing strategies cannot scale well to achieve the desired level of test adequacy. Differential (Diff) testing is one of the new testing techniques adapted to fill this gap. It uses the same input to run two versions of a software, and then compares the output of two runs to find abnormalities that may lead to bugs.
Differential testing has been adopted quickly by hundreds of teams across at Google. Meanwhile, many new diff testing frameworks were developed to simplify the creation, maintenance, and analysis of diff tests. Curious by this emerging popularity, we conducted the first empirical study on differential testing in practice at large scale. In this study, we investigated common practices and benefits of diff testing. We further explore the features of diff tests that users value the most and the pain points of using diff testing. Through a user study, we discovered that differential testing does not replace fine-grained testing techniques, Instead it supplements existing testing suites. It helps users verify the impact on unmodified and unfamiliar components in the absence of a test oracle. In terms of limitations, diff often appears to generate noisy and flaky outcomes. Finally, we highlight problems to guide future research in testing.