“This is interesting—didn’t we see this same thing in the logs just before we crashed last time?” – an anonymous and somewhat embarrassed DevOps team member
Unfortunately the “same thing” is an obscure, otherwise uninteresting log entry, in a directory structure that’s rarely checked. More unfortunate is that the “last time” was two months ago, when their eCommerce site went down on Black Friday costing hundreds of thousands of dollars, and they had noticed this same entry but just now made the correlation. Companies have made great strides in attempting to speed up correlations like this and improve MTTR. The question that always arises (usually not long after the CFO sees the price tag) goes something like, “Is it really worth deploying an agent to all of our key app servers with the sole purpose of gathering and merging disconnected artifacts (logs) with the hope of finding an interesting correlation when it counts?”