Refactoring Code Like a Blacksmith

How to even start this post? Well, there we go… Imagine this: You’re granted access to a massive platform that’s been patched over the years like an old race car. It’s a labyrinth of services, each one wired to the others like a giant spaghetti. You tweak one microservice, and… Bam! three others spin out of control and you desist once again. Then, stakeholders agree to “make it faster, cheaper, and bulletproof by next sprint”. Sounds like a nightmare? It is! Welcome to the chaotic world of platform optimization!

How do you even start fixing this beast? You could tune a database query here, split a monolith there, maybe shave a few milliseconds off a function. But what about the thousands of other tweaks you could try?

Refactoring a single function? Piece of cake. Profile it, tweak it, test it, done. But optimizing an entire platform is like trying to sculpt a mountain range with a spoon. You might make one service faster, but is it the best setup possible out of all possible combinations? Welcome to the trap of local minima, which is settling for a decent fix when a game-changing architecture is hiding just over the next ridge.

To find the real deal, you need a smarter way to navigate.

Introducing annealing

Imagine you’re a blacksmith forging a sword. You heat the metal until it’s red-hot and bendy, then slowly cool it to lock in not only the final shape, but a stronger structure. That’s annealing: a process that avoids weak, brittle results by allowing controlled chaos.

A blacksmith refactoring a piece of metal

In software, simulated annealing does the same thing. You put your platform through “worse” setups (like splitting a service that temporarily slows things down) to escape the annoying local minima and find a better overall architecture.

It’s like taking a detour to find a faster highway.

Simulated Annealing is the algorithm that makes this magic happen. It starts with your current platform setup, measures its “energy” (cost, latency, error rate or whatever), and randomly tweaks it. Sometimes, it keeps a worse setup to avoid getting stuck, but it gets pickier as the system “cools down”, aiming on a near-optimal configuration. No need to test every possible combination by hand, we are not animals.

Cool note: There’s also something called Quantum Annealing, which uses quantum hardware to zip through multiple configurations at once, like a race car with a teleporter. Cool, niche and freaking expensive, so simulated annealing is your go-to for most platform problems.

Simulated annealing in the AI era

We’re so lucky to be building in the age of AI and agentic systems. Just 3 years ago, a platform-wide refactor meant months of meetings, fear, rollback plans and more fear. Today you can literally instruct your AI coding agent to “use simulated annealing to explore the refactor space”, define what “energy” means for your platform (p99 latency, error rate, infra cost, a weighted blend, etc.), and let it go nuts while you sip coffee.

This approach is actually quite simple. Just be sure to give your agent some strict guardrails (tests, SLO gates, budgets), so you get a good balance of exploration and stability.

If you feel lost facing a heavy refactor, I recommend you to use simulated annealing as your default approach. Run it on a staging mirror, review the Pareto-front, then promote with confidence.