You equally need metrics and measurements around testing and software quality. But when misapplied, these may lead to bigger problems. In this blog, we’ll look at a starting point for testing metrics.
We have long known announcing metrics will change the behavior of people whose work you are measuring. For example, when competitive athletes have to hit a certain metric to qualify for an event, they adjust their training to meet that goal.
When you identify and use measurements of interest wisely, it can drive desired behavior while giving a valid expression to project status.
However, testing metrics programs can be problematic. Used well, they can improve the performance of the team and the entire organization. Used poorly, they can undermine your team’s morale, hurt productivity and, in turn, lower the overall quality of the software produced.
Managers and leaders need to understand the state of a project at a high level. The people working on the project also need the ability to look at summary data beyond the view they get “in the trenches.” The metrics I describe can benefit both management and your team. Here are some measurements I find useful.
Measuring Test Coverage
Test coverage is a broad term. The short definition is “looking at what you have tested and what you haven’t.” This is not really a single measure but a collection of things to consider independently and as a whole. It is helpful to answer the question of how much testing you currently do and what remains.
Test coverage can shed light on the question of why you didn’t find bugs. More importantly, it can help us see if we are testing the right things.
The easiest place to start is the obvious: the codebase. Adding a code coverage tool to your continuous integration system will give an idea of how much of your codebase is covered by unit tests. The first few deployments may not be immediately useful. However, these give you a baseline to compare other integrations and deployments, which will help you look for trends. If unit test coverage is going down, you could expect to see an increase in bugs and time spent on them.
You also want to consider manual testing. One way to do this is through product inventories. Talking about software through abstractions like pages, features, scenarios, configurations and so forth is not new. Most of us do that frequently. We can keep track of what we test without automation or other tools the same way – by these abstractions.
Using exploratory testing techniques, you can track areas of the application you are exercising. By keeping a “testing journal” where you record the types of activity or transactions used for what aspects of the software, you can build a reference of what you have tested. These can be in a lightweight, easy to share format, like an Excel sheet or a mind map. This provides a map of coverage that might not be described in code coverage. You can review this coverage in much the same way you review code coverage.
Still, test coverage does not tell us anything about the quality of testing. It doesn’t show if you’re designing tests that will find important problems. It does, however, give you a place to start the conversation. The goal in looking at test coverage is to see what’s in the test plan, review what’s already covered and see what’s missing.
Measuring Rework
Rework is the result of something not being done quite right the first time. When you track rework, you want to look at work items outside of testing, like multiple changes to the same piece of code, in impact testing, even though they are outside of testing.
Bugs are usually the first piece of evidence. For example, you merge a new change into the test branch. When we start looking closely, we find that submitting the page fails under a few different conditions. We then have to spend time investigating and collecting the errors, reporting bugs and then waiting for the development team to build and merge the fix back into the test branch. Hopefully, things work the second time around, but sometimes they don’t.
This can also happen before development starts. Many organizations use a Three Amigos meeting before any work begins. This helps make sure everyone understands what the change does. It also provides an opportunity to decide if the solution is the right choice for your needs.
One client I worked with would hold these a few times a month. We would look at each request and talk about it. Everyone was in the room, and everyone had an equal voice. Surprisingly often, we would find the new change conflicted with a feature already in progress. Sometimes one aspect of the change wasn’t clear and needed more information. We would send that request back for clarification, and we would look at it again in a later session. Many times, stakeholders would point out other problems with the request and how it would impact business units in ways the original requestors had not realized.
In these cases, if the development team had started working – instead of waiting for clarification – that would have been a form of rework, even if they executed the original change flawlessly.
Rework is usually the result of testing. But the story it tells is usually about the development environment. When rework happens frequently, it can mean the team is missing people with the right skill sets. It may also point to a rushed timeline or sacrificing quality to get the product built faster.
Each unit of rework impacts your schedule and has a cascading impact on future work. These also impact the budget. Each of these instances means more people are working on tasks longer than you initially planned.
The easiest way to begin tracking rework is to measure bugs or work items that move backward in the flow. This is not to count bugs or to see how long it takes you to test something. The intent is to learn what it is in the environment that causes a large number of bugs or false starts. This shows us how we can improve the situation with skill development or by removing project constraints.
Measuring Regression Problems
Problems found during regression testing are a specific form of rework. Tracking these are helpful when evaluating an application’s readiness for release to acceptance testing or production.
Regression bugs tell us less about the testing and more about development practices or the environment itself. Like most metrics around testing, regression bugs are trailing indicators for development methods. When regression testing finds bugs you previously found and fixed, there are issues in the development process to address.
Sometimes regression testing is done in an environment other than the main testing location. This might be an “acceptance test” environment or some other location. When you find bugs in that environment not found elsewhere, this points to the deployment process. Tracking these bugs gives insight into problems in configuration, data structures and other environmental concerns. Unfortunately, these can be among the most expensive issues to fix before releasing to production.
Measurement Drives Behavior
When you identify and use measurements of interest wisely, you can drive desired behavior by identifying the target or goal. People change their behavior to make sure any metric moves the “right way.” Sometimes this is done on purpose. Sometimes it is subconscious (Goodhart’s law). The measurements you select need to help you understand specific problems. Use these as information points or boundaries around the development and deployment of the software. Make it clear these are not performance measures for teams or individuals to help ease tensions. Be clear when communicating the measurements and their purpose to ease tensions further.
Those communications are important reminders for you as well. It can be tempting to reward people who find the most problems or developers who write the fewest bugs. I find this counterproductive because rewarding the most bugs detected or the fewest bugs written does more harm than good.
Experienced developers take on the most complex and challenging work, which increases the odds of finding problems. Better testers will take their time and dive into the behavior of the software. They tend to look at the entire system, not simply what has changed. They may not find as many bugs as someone blowing through the application to find smaller, cosmetic issues. However, they are likely to find problems impacting the behavior of the software and the experience of the customers.
Conclusion
What metric should you start with? Pick a problem impacting your team, organization or customers. Find ways to measure around that problem. Let the team know the purpose and why it matters. Let the measurement itself help drive good behavior.