What’s the Key to Debugging in Today’s Distributed Systems?

 
merian-ventures-web-blog-honeycomb-chaos-con-banner2.jpg.png

THE NEW STACK
By Joab Jackson


Test software updates in production.  It goes against conventional wisdom. But in today’s distributed systems, mirroring your staging environment to production is a fool’s errand. Charity Majors, CEO of Honeycomb.io, presented at Gremlin’s recent Chaos Conference.

The negative reaction to the idea of testing in production is based on a false dichotomy. It assumes that there are only two options for software development which is, at heart, an exercise in repeated testing. One is to test software totally in its own “sandboxed” environment. The other option would be to upload the new code to the cloud for all users.

But there are multiple techniques, such as A/B testing and Canary testing, that allow you to try code on a small portion of the entire user-base, so you can collect metrics before rolling the update out system-wise.

With distributed systems, what can go wrong is this infinitely long tail of things that will probably never happen, but one day they do. “How are you going to find that in a staging environment? Spoiler alert: You’re not.”

This is why is so important — which Honeycomb specializes in — because you can’t predict where the failure will occur.


Marta Bulaich