Coverage Metrics

There are two strange metrics on the project where I work: how much is tested and how much works. Every time I should update these two numbers my head explodes. I don’t like them at all.

What means “how much is tested”?

Traditionally we are writing a percentage of completed checks out of all checks in the checklist. I don’t like this tradition, because checks are different. Here you should just assert error’s text message, and there you should produce this error under some conditions.

I see this solution. Every check should have corresponding number like a story point. Let’s call it a test point. To check text there is 1 tp, and to reproduce an error — 3 tp. That way the whole checklist costs 4 tp. When we are going through the checklist, a tester should write how much is tested for every check. For example, text is asserted — 1 pt. But reproducing is done only for several scenarios, so only 2 tp. In total, we completed 3 out of 4 test points.

What means “how much works”?

Based on what are other testers are doing, it is a number of “green” (or passed) checks out of all checks.

Consider this situation: test some kind of import. Test session is started, and voila, there no “Import” button. That’s a blocker and no more checks could be done. What we are writing for “how much works”? Zero. A developer fixes the button. Checks are flying, everything is perfect, we are writing 100% (a miracle).

Here is a question: is it correct that we put zero after the first session? After all, it was only a button, everything else was working already. I think it’s correct. But we should rename this metric, because it shows not how much works, but how much is available to a user and to what extent.

And what is in result?

Checklists are measured in test points, availability metric is renamed. Now watch carefully. We are throwing away percentages. After all, all checklists are different. Why should we use percentages, when they can’t show reality. 20% of a small task with 10 tp — not bad, 80% of a task with 1000 tp — can ruin a release. And strictly speaking this method is using not ratio, but interval scale. You can’t multiply and divide in this scale, therefore, can’t calculate percentage.

Guru talks

Michael Bolton writes a lot about choosing right scales and using them properly. Recently there was yet another article. His positions is that in testing even interval scale is too much, nominal and ordinal are more correct. There is an excellent example in that article, so I just leave it here: