Coverage is not strongly correlated with test suite effectiveness

31 May 2014

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 435-445
https://doi.org/10.1145/2568225.2568271

Abstract

The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did not account for the confounding influence of test suite size. In addition, most of the studies were done with adequate suites, which are are rare in practice, so the results may not generalize to typical test suites. We have extended these studies by evaluating the relationship between test suite size, coverage, and effectiveness for large Java programs. Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness. We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.

Keywords

This publication has 18 references indexed in Scilit:

Comparing non-adequate test suites using coverage criteria
Published by Association for Computing Machinery (ACM) ,2013
The influence of size and coverage on test suite effectiveness
Published by Association for Computing Machinery (ACM) ,2009
Is mutation an appropriate tool for testing experiments?
Published by Association for Computing Machinery (ACM) ,2005
The effect of code coverage on fault detection under different testing profiles
Published by Association for Computing Machinery (ACM) ,2005
The confounding effect of class size on the validity of object-oriented metrics
IEEE Transactions on Software Engineering, 2001
Quantitative analysis of faults and failures in a complex software system
IEEE Transactions on Software Engineering, 2000
All-uses vs mutation testing: An experimental comparison of effectiveness
Journal of Systems and Software, 1997
An experimental comparison of the effectiveness of branch testing and data flow testing
IEEE Transactions on Software Engineering, 1993
Investigations of the software testing coupling effect
ACM Transactions on Software Engineering and Methodology, 1992
A Modification of Kendall's Tau for the Case of Arbitrary Ties in Both Rankings
Journal of the American Statistical Association, 1957

Cited by 273 articles