The value of engineering metrics


What makes a good engineering metric?

Our industry has been quite obsessed with activity metrics in recent years. And for good reasons: engineering is taking a growing place in every company and it’s only natural to seek performance. One great thing about software is that most things are easily measurable. The challenge is that not everything that can be measured should.

What makes a good measurement?

There are two guiding principles to keep in mind, both credited to Douglas W. Hubbard:

  1. What makes a measurement of high value is a lot of uncertainty combined with a high cost of being wrong.
  2. If a measurement matters at all, it is because it must have some conceivable effect on decisions and behaviors. If we can't identify a decision that could be affected by a proposed measurement and how it could change those decisions, then the measurement simply has no value.

Case study: time to review pull requests

Let’s take the controversial example of a widely used measurement, the average time to review pull requests (i.e., the delay from code being committed to being merged), and evaluate it through the lens of these principles.

How high is the level of uncertainty? I would expect most engineers to have a good sense of their team’s average review time. They certainly wouldn’t get it right down to the hour, but teams would know if they are in the 0-3 days or in the 10-20 days range.

How high is the cost of being wrong? Not much. Yes, it might be the case that the time to review is a major bottleneck in some organizations. But that’s not something you need a measurement to tell you, because the level of uncertainty isn’t that high. Tell a team that their average time to review is 10 days, and chances are the engineers not only already know, but are seriously frustrated about it.

Would this measurement have some conceivable effect on decisions and behaviors? As pointed out earlier, it is unlikely that the team would learn much from the measurement in itself. That being said, aiming to minimize review time can lead to lowering the average batch size (i.e., smaller pull requests), which is a generally desirable thing. If the measurement won’t prove massively revealing, it’s handy as a reference to ensure that our process doesn’t degrade.

Case study: how we allocate our efforts

Our distribution of efforts as a group on different categories of work is not easily tracked. John Cutler has, as always, an excellent visualization about that. Our bias here is obviously significant as this is one of the key thesis underlying Echoes.

How high is the level of uncertainty? Very high. Caught in the day to day, it’s extremely difficult to zoom out and get a good sense of how much we truly spend chasing a variety of different goals. It’s difficult to achieve at the individual level and only gets worse as we look at whole teams or entire departments.

How high is the cost of being wrong? Very high. We might be prioritizing work without having a clear understanding of our own capacity. We might be pressured into taking on unplanned work and totally dropping the ball on the highest priority items for the company. We might be spending so much time putting out fires that we cannot think clearly that this is not how things should be. There is very likely nothing more strategic and expensive to waste than engineers’ time.

Would this measurement have some conceivable effect on decisions and behaviors? Absolutely! This measurement may:

  • Inform prioritization decisions, either by providing clarity on our true capacity, or by revealing how toil (or cognitive overhead, as Team Topologies calls it) needs to be controlled before literally anything else.
  • Inform organizational decisions by pointing out duplication of efforts across teams and opportunities for shared platform services.
  • Inform staffing, either by pointing out teams that have reached an area of diminishing returns on their mission (as could for example be shown by high levels of enhancements over innovation or new capabilities), or teams that simply cannot deal with their amount of work.

Closing words

If it is commonly accepted that engineering productivity cannot be reduced to a single number, we are now faced with an abundance of metrics which may or may not act as a good proxy for performance. Sorting through all the tools at our disposal requires us to remain focused on the goal: reducing uncertainty and informing decisions.