Why is debugging so tedious?

I have the feeling that others see me as the bug guy, I mean debug guy — people come to me with the weird ones. I don’t love it, but I’m not particularly annoyed by them.
Picture this: It is 11pm. You are by the fourth coffee, have eleven tabs open, and keep running the same script with one variable changed each time. You are hitting a bug, you form a theory and write a patch. “It must be the cache.”, fingers crossed … but the patch doesn’t work, so you try another one. “Probably a race condition.”, doesn’t work either. You start trying whatever comes to mind, and start wondering if you should drop everything and become a farmer instead. “It worked yesterday.”, you say. Four hours later the error stops showing up. You are not sure why, but you are grateful, so you don’t ask questions. Yet, in the bottom of your heart, you know some day it’s gonna come back.
I call this “throwing stones”, but the common English slang is “shooting in the dark”, or maybe “spray and pray”, whatever. You see a symptom and you guess, and take a shot.
There is another way tho, a better way. The crime has already happened, and the killer left a trace, and your job is to find the clues, follow the tracks, and catch the son of a bitch. Just gather the evidence and deduce what happened.
But what I see over and over is that people tend to skip that path, because deduction takes patience, like hunting. And in the modern, fast paced, world of productivity, patience is a luxury, or is it a strategic investment?
“Move fast, break things”, that’s a good mantra, and it works. When you are building something new, no amount of analysis beats putting a prototype in front of real users, because all you have is speculation, and analyzing speculation just produces better speculation, and the problem, the product, aren’t even clear yet. It is an open question, with many possible answers. So the build it and see what happens, is the actual right approach.
However, with a bug you have the data, you have the product, the problem and the what it should do and what it actually does. A bug is a well posed problem, there is a unique solution to it, always (but maybe don’t quote me on that).
We took “move fast, break things,” which was a strategy for building under uncertainty, and applied it to debugging. So we patch first and ask questions later. We stitch up the patient, give them some paracetamol and send them home. Only to get a call at 3am, in Slack, cuz you left the bullet in and they are bleeding out.
We talk a lot about SOLID, design patterns, clean code, architecture, yada yada. But I see very few people if any, talking about debugging.
When production is on fire at 3am, nobody calls the architect. They call the bug guy. Sounds like a commercial, I know. Clean code is useless in that moment, and so are the design patterns. You need the guy who can read “Error 402386, segmentation fault”, and trace it back to some pointer buried very deep that is leaking memory, slowly building up for days. The devil is in the details, my friend.
Debugging is 90% finding the cause and 10% fixing it. Once you find it, the fix only takes a couple of minutes.
So, here is what I learned from my scars
Some useful steps you can follow to make debugging less boring and more predictable.
Replicate it. Or, if you can’t replicate locally, find a real instance in your logs or your error monitor. (If you don’t use one, I genuinely don’t know what you are doing.) Replication gives you the one thing you need most: a clean yes-or-no on whether the bug is still there.
Read the stack trace honestly. Sometimes the symptom IS the cause and you are done. Often it is not. The trick is not deciding too fast.
Use session telemetry and correlated events when you have them. A bug is rarely an isolated event — it is connected to something: a user action, a deploy, a feature flag.
Run the debugger all the way to the explosion. Walk backwards from there.
And if you need to pull out the big guns, you have git bisect, which honestly, I have never seen a colleague use. Most engineers I’ve talked to don’t know what it does, or have heard of it, but never bothered. git bisect runs a binary search over your commit history: you give it a commit where the bug existed and a commit where it didn’t, and it checks out the middle commit and asks “is the bug here?” You answer yes or no -here is where the replication pays off-, and it picks the next middle. After log N iterations you have the exact commit that introduced the regression, which is a very small surface, unless you didn’t do small atomic commits — if your commits are “fixed a bunch of stuff” with five hundred lines of changes inside, bisect still finds the commit, but then you are searching inside it from there. The discipline of small commits earns interest, and bisect is the day it pays.
And if none of that works, you can always smash your laptop against the wall and become an alpaca farmer.