Redirecting to https://www.makingdatamistakes.com/how-to-write-good-software-faster-we-spend-90-of-our-time-debugging/ - click if you are not redirected.


Note: You will be redirected to the original article. A local copy is included below for convenience.

Years ago, I was eating dinner with relatives when one of them asked, 'So... what did you do today?'. When I thought back, I realised I’d spent the better part of my afternoon puzzling over a glitch in my code. I’d eventually traced it to a colon that should have been a comma (Matlab cares about that sort of thing). When I realised just how much time I’d spent on something so trivial, I muttered a sheepish response, stuffed another bite of lasagne in my mouth, and tried not to think about how cruel a mistress code can be.

We spend 90% of our time debugging. Learning to type twice as fast will only speed up your overall development time by a few percent. So if you want to speed up how fast you develop finished software, focus your efforts on reducing time spent debugging.

But of course it’s so tempting to code first and ask questions later. Hammering away on the keyboard into the greenfield of a blank IDE screen is incredibly satisfying - it feels like making rapid progress.

But efforts to speed up how fast you write the code in the first place will have much less impact than finding ways to write fewer bugs or find and fix them faster. So let’s first understand the claim that the bulk of our programming effort goes into debugging. Then let’s think about how this should change the way we write software.

What do we mean by ‘debugging’?

By debugging, I’m talking about investigating and addressing violated expectations[1], that is, time spent trying to understand why things aren’t working as you think they should and then acting on that.

Do we really spend that much of our time debugging?

Even though the machine does exactly what we tell it, we are surprised over and over when we write software.

That we spend the majority of our time debugging has been so consistently true, watching my own and hundreds of other programmers, that I’ve come to treat it as an axiom. From the literature, Glass (2002) notes that “error removal is the most time-consuming phase of the lifecycle”.

Moreover, not all time spent debugging is equal. Finding and fixing a bug before the code is live is cheap in every sense: it’s probably quick to fix because you still have the full system in your mind, and you remember how it all works. There’s not much reputational cost. It’s probably still within working hours. It doesn’t affect the roadmap, user experience, no decisions have been made on the back of it, nor money lost.

In contrast, a bug in an important algorithm that’s been running for a year is expensive in every way. It might take ages to even figure out when the bug was introduced, let alone the cause of it. You may not be able to use your debugger on the production server, so you’re relying on spotty logs. Time spent fire-fighting is demoralising, and can be high stress - big bugs always seem to show up the night before an investor meeting. They can be more complex to fix - perhaps you need downtime, a custom deployment, or reprocessing a year’s worth of data from backups.

Time spent fire-fighting is demoralising, and can be high stress - big bugs always seem to show up the night before an investor meeting.

In other words, bugs cost much more as you get further towards production. It could take literally weeks of investigation for a complex or high-stakes bug.

When we talk about debugging time, it’s easy to ignore these kind of outliers, to put them in a different category. But they’re not. They should be considered part of the development time for that piece of work. Time spent debugging has a long tail. I wouldn’t be surprised to find that there’s some kind of exponential distribution, like earthquake magnitudes.

In other words, costly bugs may be rare, but when you amortise them over multiple pieces of work, they significantly drive up the proportion of overall time spent debugging. When you add it all up over the lifecycle of your code, debugging ends up taking up much more time than we imagine, and it’s usually not enjoyable time.

So how can we spend less time debugging?

To reduce the time we spend debugging, we should focus our efforts on introducing fewer bugs in the first place, finding them sooner, identifying their bounds correctly, and fixing them well.

Introduce fewer bugs in the first place

Find bugs early and quickly

Aim for a debug cycle time of less than a second

Sharpen your axe

Find bugs upstream (further up the call stack)

Static typing

Improve your logging

Be a scientist

Fix bugs “well”

If you’ve found the problem early, identified its bounds correctly, and tested your hypothesis successfully, then fixing it will hopefully be the easy part.

When you’ve found a bug, don’t just rush to fix it. First, introduce a new test that isolates it. Make sure that the test fails. Fix the bug. Make sure that the test now passes.

It’s important to confirm that the test fails before you fix it, otherwise you won’t notice if you’ve written a crappy test. In such cases you may think your test captures the bug, but perhaps the test isn’t quite right or isn’t even running at all! If you only confirm that it passes after you think you’ve fixed things, you may not realise that your test was inadequate and maybe also your fix.

Don’t create a new bug when you fix the old one. Yeah, I don’t know how to avoid doing that, but try not to.

To sum it up:

Footnotes:

[1] Glass (2002) refers to ‘error removal’, which is similar but too narrow for my tastes. The investigative aspect of debugging is often the hard and uncertain part - we can all think of a time we spent two hours tracking down the source of a problem, and then fixed it in under a minute once we found it. Indeed the fruits of that investigation don’t always lead to action. We may put in a lot of effort to understand an error, only to decide at the end of the debugging investigation that it’s too much hassle to fix. Or indeed, we might work hard to figure out why the program is behaving as it is, only to eventually realise that it’s correct and our expectations were wrong - i.e. there was no error to remove. That still counts as debugging to me, since things weren’t working as I thought they should. So debugging is investigating and addressing violated expectations, a superset of error removal.

Resources