Three things I believe.
Most production bugs are predictable from the code that produced them. Read the surrounding twenty lines carefully, ask what the system looks like under load, and the bug is usually visible before it ships. We pretend otherwise because the alternative means most outages were preventable, and that is not a thing most teams want to write down.
The hardest problems in software are organisational, not technical. Two teams that need to ship in lockstep do not need a better service boundary, they need the same manager. Most architectures I have helped unwind were correct on a slide and wrong on a calendar.
If you cannot explain it to a junior, you do not understand it yet. This is not a teaching maxim, it is a debugging tool. The moment you reach for "well, it depends" without naming what it depends on, you have stopped reasoning and started gesturing.
who I am
I am Tomasz Kulesza. I currently work at Miloan sp. z o.o., where I spend most of my time on PHP systems that move money around and try not to lose it.
This place now is my collection of things that actually survived a collision with reality. Heavily PHP-based, because that's what powers the majority of production systems I deal with. The rest is just the boring stuff at the edges: payment gateways, state machines, idempotency, and service boundaries that looked perfectly clean in 2021, only to become incredibly expensive by 2024. Truth is, the boring stuff is exactly where production goes down in flames.
how to reach me
Email is the slowest channel and the one I read most carefully. Write to [email protected]. Code lives at github.com/n1kula and gitlab.com/tlkulesza. The site has an RSS feed if you want new notes pushed at you.