Dropbox Outage Postmortem: Not Hacked, Just Another Maintenance Fiasco


From Dropbox:

…On Friday at 5:30 PM PT, we had a planned maintenance scheduled to upgrade the OS on some of our machines.

…In this case, a bug in the script caused the upgrade to run on a handful of machines serving production traffic.

…some master-slave pairs were impacted which resulted in the site going down.

…We were able to restore most functionality within 3 hours, but the large size of some of our databases slowed recovery, and it took until 4:40 PM PT today for core service to fully return.

Deeper details