Structural Coverage Criteria for Neural Networks Could Be Misleading
There is a dramatically increasing interest in the quality assurance for DNN-based systems in the software engineering community. An emerging hot topic in this direction is structural coverage criteria for testing neural networks, which are inspired by coverage metrics used in conventional software testing. In this short paper, we argue that these criteria could be misleading because of the fundamental differences between neural networks and human written programs. Our preliminary exploration shows that (1) adversarial examples are pervasively distributed in the finely divided space defined by such coverage criteria, while available natural samples are very sparse, and as a consequence, (2) previously reported fault-detection “capabilities” conjectured from high coverage testing are more likely due to the adversary-oriented search but not the real “high” coverage.