Category Archives: Outage Alert

Dropbox Outage Postmortem: Not Hacked, Just Another Maintenance Fiasco

 

From Dropbox:

…On Friday at 5:30 PM PT, we had a planned maintenance scheduled to upgrade the OS on some of our machines.

…In this case, a bug in the script caused the upgrade to run on a handful of machines serving production traffic.

…some master-slave pairs were impacted which resulted in the site going down.

…We were able to restore most functionality within 3 hours, but the large size of some of our databases slowed recovery, and it took until 4:40 PM PT today for core service to fully return.

Deeper details

Online Storage Provider Nirvanix Reportedly Two Weeks From Shutdown

According to a report today in Information Age:

“US-based cloud storage provider Nirvanix tells employees it has “gone to the wall”, gives customers until the end of the month to move their data elsewhere .”

The company was founded in 2007 after an online storage company called StreamLoad split into consumer and business units. Not longer after, the consumer arm – MediaMax – gave customers one month to relocate their data following a botched migration onto the Nirvanix platform.

(Source)

Breaking: GitHub Back Up After Today’s DDoS Attack

GitHub was essentially down for about an hour today, starting at around 11 am Eastern (1500 UTC) due to a reported DDoS attack. From their status page:

15:05 UTC We’re currently experiencing a large DDoS attack. The site is experiencing major packet loss and is mostly unavailable. We’re working to further mitigate the attack.

16:10 UTC We’ve mitigated the DDoS attack and the site should responding normally. We’re still investigating the cause of the small increase in exceptions when accessing the GitHub API.

Hey Network Solutions, New Rule: Use Social During an Outage

Network Solutions is in trouble today. Rumor has it DNS issues are the root cause, but it’s unclear. What is clear is if your site is hosted by NetSol it is unreachable.

If you dig really hard you can get links to their blog which might offer more detail. But… it’s unreachable (duh).

I picture NetSol personnel happily posting critical updates to a blog only they can reach.

New Rule: If your servers/dns/routers/network is experiencing problems, use your Twitter and Facebook accounts to communicate with customers. Don’t want your dirty laundry messing up your marketing? Set up Twitter/FB Support accounts.

Google Apps Status Sorta Takes Responsibility for Today’s Outage

From the Apps Status Dashboard:

7/10/13 10:40 AM

The problem with Gmail should be resolved. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

The affected users were located in West Virginia, North Carolina, Nebraska & Georgia

Same message repeated for basically all services.

It’s a little vague and incomplete. Was it Comcast (other services were also down and they ALL came back at once here in Florida)? Or was it actually Google?