Lots of reports that most Google services are down for many.
App Status Dashboard also down as of this writing, so I guess we’ll have to rely on Bing and Twitter to learn more….
Lots of reports that most Google services are down for many.
App Status Dashboard also down as of this writing, so I guess we’ll have to rely on Bing and Twitter to learn more….
Google reported a problem with Gmail today and not long after said it was resolved:
3:02 AM: We’re investigating reports of an issue with Google Mail. We will provide more information shortly.
3:43 AM: The problem with Google Mail should be resolved. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.
Users were experiencing 502 errors and latency when accessing email.
Google issued an incident report on the Wednesday outage that affected less than one per cent of gmail users, but was significant for other services, including half of Admin Panel and 60% of Sync login requests. As has happened in the past, it was a configuration error for a central system, in this case Google Services Login, where the configuration glitch caused too many requests to be routed to too few servers, causing them to buckle under the load:
From 5:00 a.m. to 8:00 a.m. PT, some users received errors when trying to access Gmail, Drive, Talk, Google Sync, the Admin panel, and the Cloud Console, and to a lesser extent Groups, Sites, and Contacts. At the peak of the outage, this issue affected 50% of the Admin panel and 60% of Google Sync login requests. The percentages of affected users for other services were lower such as 0.18% users for Gmail. The root cause was an issue in the system that manages login requests for Google services.
At 5:00 a.m. as login traffic increased, the misconfigured servers were unable to process the load. This began to cause errors for some users logging in to Google services. The request load, exacerbated by retry requests from users and automated systems such as IMAP clients, initially appeared as the cause of the login errors. At 5:48 a.m., the Engineering team determined that the root cause was not excess traffic but insufficient capacity
The full report is less than two pages, and clearly outlines what happened and how they hope to prevent it in the future.
Seven Google services were down part of today. Take a look at the apps status dashboard for details. As of this writing the Admin Control Panel/API still shows a service outage symbol but the details indicate the problem has been resolved.
According to a survey by Kelton done for TeamQuest, nearly four in ten respondents reported having suffered a cloud outage:
Many survey respondents believe the reported outages could have been prevented. Capacity management is sighted as one way to minimize the risks associated with cloud computing, according to respondents in a survey from Kelton Research, commissioned by TeamQuest Corporation.
This time it was Postini:
March 25, 2013 1:38:00 PM PDT
We’re investigating reports of an issue with Postini Services.
March 25, 2013 2:38:00 PM PDT
Postini Services service has already been restored for some users, and we expect a resolution for all users within the next 1 hours. Please note this time frame is an estimate and may change. (editor’s note: resolution took over six more hours).
March 25, 2013 9:05:00 PM PDT
The problem with Postini Services should be resolved. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.
Google Drive is stalled again for some users:
March 21, 2013 7:07:00 AM PDT
We are continuing to investigate this issue. We will provide an update by March 21, 2013 8:07:00 AM PDT detailing when we expect to resolve the problem.
Users are able to access Drive, but they may experience slow behavior or sporadic errors.
According to Google, the outage for some Google Drive users should be completely resolved.
Still having a problem? Then Google want to hear about it:
The problem with Google Drive should be resolved. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better. If you are still experiencing an issue, please contact us via the Google Help Center.
From the Google App Status Dashboard:
March 18, 2013 7:17:00 AM PDT
We’re investigating reports of an issue with Google Drive. We will provide more information shortly.
March 18, 2013 8:10:00 AM PDT
We’re aware of a problem with Google Drive affecting a significant subset of users. The affected users are unable to access Google Drive. We will provide an update by March 18, 2013 9:10:00 AM PDT detailing when we expect to resolve the problem. Please note that this resolution time is an estimate and may change.
March 18, 2013 8:55:00 AM PDT
Google Drive service has already been restored for some users, and we expect a resolution for all users within the next 1 hours. Please note this time frame is an estimate and may change.
Water and servers don’t mix. Storms can do more than cut the power to a data center, they can also breech walls, flood, or otherwise damage a center. A natural disaster like Hurrican Sandy can also make it difficult for staff to even be there to do their jobs, and can delay the arrival of replacement parts, fuel for generators, and so on.
Two posts at Data Center Knowledge do a good job of outlining how they prepared, and what actually happened.