Here's what scaling without observability actually costs:
It's 2am and the phone rings. “The server has crashed, and we can't get the AGVs to reconnect.” This was the phone call I had dreaded for months.
I had been helping a system integrator develop a material handling solution using AGVs, PLCs, and multiple software services to coordinate everything. It was an elegant system, with one problem: AGVs would sometimes disconnect from the system and come to a complete halt, seemingly at random. We had no idea what was causing it, and troubleshooting was a nightmare.
To diagnose the problem, we had to send engineers physically to the site to watch the system in real time, wait for a disconnect to happen, then manually trace network requests across multiple fragmented logs. We had a 12-hour window to do it before the logs were overwritten and the evidence was gone.
Eventually, the customer had enough. No further projects until this issue was resolved. The result: $40M worth of revenue gone.
The painful part? We had all the data we needed to solve this — it was just scattered, fragmented, and disappearing faster than we could chase it. We didn't have a technology problem. We had a visibility problem.