{"id":39052,"date":"2019-06-03T09:09:16","date_gmt":"2019-06-03T09:09:16","guid":{"rendered":"http:\/\/icloud.pe\/blog\/?guid=d6159c6636938bcd6d4b14ca816c4796"},"modified":"2019-06-03T09:09:16","modified_gmt":"2019-06-03T09:09:16","slug":"four-hour-google-cloud-outage-blamed-on-network-congestion","status":"publish","type":"post","link":"https:\/\/icloud.pe\/blog\/four-hour-google-cloud-outage-blamed-on-network-congestion\/","title":{"rendered":"Four-hour Google Cloud outage blamed on &#8216;network congestion&#8217;"},"content":{"rendered":"<p><span class=\"field field-name-field-author field-type-node-reference field-label-hidden\"><br \/>\n      <span class=\"field-item even\"><a href=\"https:\/\/www.cloudpro.co.uk\/authors\/jane-mccallion\">Jane McCallion<\/a><\/span><br \/>\n  <\/span><\/p>\n<div class=\"field field-name-field-published-date field-type-datetime field-label-hidden\">\n<div class=\"field-items\">\n<div class=\"field-item even\"><span class=\"date-display-single\">3 Jun, 2019<\/span><\/div>\n<\/p><\/div>\n<\/div>\n<p class=\"short-teaser\">\n<a href=\"https:\/\/www.cloudpro.co.uk\/\" title=\"\" class=\"combined-link\"><\/a><\/p>\n<div class=\"field field-name-body\">\n<p> Google Cloud Platform (GCP) suffered a significant outage on Sunday night that lasted nearly three hours, knocking offline services including <a href=\"https:\/\/www.cloudpro.co.uk\/collaboration\/7993\/google-g-suite-review-suite-like-chocolate\" >G Suite<\/a>, YouTube and Google Cloud.<\/p>\n<p>The issue was first noted on <a href=\"https:\/\/status.cloud.google.com\/incident\/compute\/19003%E2%80%9D\" >the company\u2019s cloud status dashboard<\/a> at 8.25pm BST on 2 June as a Google Compute Engine problem.<\/p>\n<p>Shortly, however, reports of problems with Google Cloud, YouTube and more started hitting Twitter and by 8.59pm, the dashboard acknowledged it was a \u201cwider network issue\u201d.<\/p>\n<div class=\"wysiwyg-widget-wrapper\">\n<blockquote class=\"twitter-tweet\" data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">Hey there, I&#39;ve asked the Support Team to look into this. Please DM me a contact email address in case we need to reach out to you. -BI<\/p>\n<p>&mdash; Google Cloud (@googlecloud) <a href=\"https:\/\/twitter.com\/googlecloud\/status\/1135267992223256578?ref_src=twsrc%5Etfw\">June 2, 2019<\/a><\/p>\n<\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n<div class=\"wysiwyg-widget-wrapper\">\n<blockquote class=\"twitter-tweet\" data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">If YouTube isn&#39;t loading for you or you&#39;re experiencing error messages, we&#39;re working to fix it!<\/p>\n<p>&mdash; TeamYouTube (@TeamYouTube) <a href=\"https:\/\/twitter.com\/TeamYouTube\/status\/1135271882985312256?ref_src=twsrc%5Etfw\">June 2, 2019<\/a><\/p>\n<\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n<p>\nBy 12.09am on 3 June, the issue was resolved but little detail is available as to what happened beyond \u201chigh levels of network congestion in the eastern USA, affecting multiple services in Google Cloud, G Suite and YouTube\u201d.<\/p>\n<p>However, someone claiming to work on Google Cloud (but currently on holiday) posted a message on <a href=\"https:\/\/news.ycombinator.com\/item?id=20077421%E2%80%9D\"><em>Hacker News<\/em><\/a> saying: \u201cIt&#8217;s disrupting everything, including unfortunately the tooling we usually use to communicate across the company about outages.\u201d<\/p>\n<p>\u201cThere are backup plans, of course, but I wanted to at least come here to say: you&#8217;re not crazy, nothing is lost &#8230; but there is serious packet loss at the least,&#8221; they added.\u00a0<\/p>\n<p><!--wysiwyg_see-related_plugin--><\/p>\n<p>\nIn a statement, Google told <em>Cloud Pro<\/em>: \u201cWe will conduct a post mortem and make appropriate improvements to our systems to prevent this from happening again. We sincerely apologise to those that were impacted by [these] issues. Customers can always find the most recent updates on our systems on our status dashboard.\u201d<\/p>\n<p>Some, however, have questioned what exactly Google meant by \u201chigh levels of network congestion in the eastern USA\u201d.<\/p>\n<p>Clive Longbottom, co-founder of analyst house Quocirca, told <em>Cloud Pro<\/em>: \u201cIf this was the case, a lot more than GCP would have been impacted: this does not seem to have been the case. As such, it would appear that what Google possibly means is that it was excessive network traffic in its own environment in the Eastern USA.\u201d<\/p>\n<p>He suggested that the excessive network traffic was potentially caused by something internal.<\/p>\n<p>\u201cThis could be something like a memory leak on an app going crazy, or (<a href=\"https:\/\/www.cloudpro.co.uk\/leadership\/risks\/6656\/aws-blames-human-error-and-s3s-gargantuan-scale-for-outage\" >like AWS some time back)<\/a> human error through a script causing a looping command bringing chaos to the environment.\u201d<\/p>\n<p>This doesn\u2019t mean that organisations should abandon cloud for business-critical workloads, however. Owen Rogers, research director at the digital economics unit of 451 Research, told <em>Cloud Pro<\/em>: \u201cFour hours is quite a long time \u2026 but it\u2019s a tricky issue, because outages are going to happen now and then, and all customers can do is to build resiliency such that if an outage does occur, they have a backup.<\/p>\n<p>\u201cUsing multiple availability zones and regions is a must, but if applications are business critical, <a href=\"https:\/\/www.cloudpro.co.uk\/it-infrastructure\/7610\/amazon-prime-day-a-lesson-in-how-not-to-handle-an-it-outage\" >multi-cloud should be considered<\/a>. Yes, it\u2019s more complex to manage; yes, you\u2019ll have to train more people. But if your company is going to go bust because of a few hours of outage, it is an investment worth making. It appears some hyperscalers are more resilient than others, but even the best are likely to slip up occasionally.\u201d <\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p><span><br \/>\n      <span><a href=\"https:\/\/www.cloudpro.co.uk\/authors\/jane-mccallion\">Jane McCallion<\/a><\/span><br \/>\n  <\/span><\/p>\n<div>\n<div>\n<div><span>3 Jun, 2019<\/span><\/div>\n<\/p><\/div>\n<\/div>\n<p>\n<a href=\"https:\/\/www.cloudpro.co.uk\/\" title=\"\"><\/a><\/p>\n<div>\n<p> Google Cloud Platform (GCP) suffered a significant outage on Sunday night that lasted nearly three hours, knocking offline services including <a href=\"https:\/\/www.cloudpro.co.uk\/collaboration\/7993\/google-g-suite-review-suite-like-chocolate\" target=\"_blank\" rel=\"noopener noreferrer\">G Suite<\/a>, YouTube and Google Cloud.<\/p>\n<p>The issue was first noted on <a href=\"https:\/\/status.cloud.google.com\/incident\/compute\/19003%E2%80%9D\" target=\"&rdquo;_blank&rdquo;\" rel=\"noopener noreferrer\">the company&rsquo;s cloud status dashboard<\/a> at 8.25pm BST on 2 June as a Google Compute Engine problem.<\/p>\n<p>Shortly, however, reports of problems with Google Cloud, YouTube and more started hitting Twitter and by 8.59pm, the dashboard acknowledged it was a &ldquo;wider network issue&rdquo;.<\/p>\n<div>\n<blockquote data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">Hey there, I&#8217;ve asked the Support Team to look into this. Please DM me a contact email address in case we need to reach out to you. -BI<\/p>\n<p>&mdash; Google Cloud (@googlecloud) <a href=\"https:\/\/twitter.com\/googlecloud\/status\/1135267992223256578?ref_src=twsrc%5Etfw\">June 2, 2019<\/a><\/p>\n<\/blockquote>\n<\/div>\n<div>\n<blockquote data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">If YouTube isn&#8217;t loading for you or you&#8217;re experiencing error messages, we&#8217;re working to fix it!<\/p>\n<p>&mdash; TeamYouTube (@TeamYouTube) <a href=\"https:\/\/twitter.com\/TeamYouTube\/status\/1135271882985312256?ref_src=twsrc%5Etfw\">June 2, 2019<\/a><\/p>\n<\/blockquote>\n<\/div>\n<p>\nBy 12.09am on 3 June, the issue was resolved but little detail is available as to what happened beyond &ldquo;high levels of network congestion in the eastern USA, affecting multiple services in Google Cloud, G Suite and YouTube&rdquo;.<\/p>\n<p>However, someone claiming to work on Google Cloud (but currently on holiday) posted a message on <a href=\"https:\/\/news.ycombinator.com\/item?id=20077421%E2%80%9D\"><em>Hacker News<\/em><\/a> saying: &ldquo;It&#8217;s disrupting everything, including unfortunately the tooling we usually use to communicate across the company about outages.&rdquo;<\/p>\n<p>&ldquo;There are backup plans, of course, but I wanted to at least come here to say: you&#8217;re not crazy, nothing is lost &#8230; but there is serious packet loss at the least,&#8221; they added.&nbsp;<\/p>\n<p><!--wysiwyg_see-related_plugin--><\/p>\n<p>\nIn a statement, Google told <em>Cloud Pro<\/em>: &ldquo;We will conduct a post mortem and make appropriate improvements to our systems to prevent this from happening again. We sincerely apologise to those that were impacted by [these] issues. Customers can always find the most recent updates on our systems on our status dashboard.&rdquo;<\/p>\n<p>Some, however, have questioned what exactly Google meant by &ldquo;high levels of network congestion in the eastern USA&rdquo;.<\/p>\n<p>Clive Longbottom, co-founder of analyst house Quocirca, told <em>Cloud Pro<\/em>: &ldquo;If this was the case, a lot more than GCP would have been impacted: this does not seem to have been the case. As such, it would appear that what Google possibly means is that it was excessive network traffic in its own environment in the Eastern USA.&rdquo;<\/p>\n<p>He suggested that the excessive network traffic was potentially caused by something internal.<\/p>\n<p>&ldquo;This could be something like a memory leak on an app going crazy, or (<a href=\"https:\/\/www.cloudpro.co.uk\/leadership\/risks\/6656\/aws-blames-human-error-and-s3s-gargantuan-scale-for-outage\" target=\"_blank\" rel=\"noopener noreferrer\">like AWS some time back)<\/a> human error through a script causing a looping command bringing chaos to the environment.&rdquo;<\/p>\n<p>This doesn&rsquo;t mean that organisations should abandon cloud for business-critical workloads, however. Owen Rogers, research director at the digital economics unit of 451 Research, told <em>Cloud Pro<\/em>: &ldquo;Four hours is quite a long time &hellip; but it&rsquo;s a tricky issue, because outages are going to happen now and then, and all customers can do is to build resiliency such that if an outage does occur, they have a backup.<\/p>\n<p>&ldquo;Using multiple availability zones and regions is a must, but if applications are business critical, <a href=\"https:\/\/www.cloudpro.co.uk\/it-infrastructure\/7610\/amazon-prime-day-a-lesson-in-how-not-to-handle-an-it-outage\" target=\"_blank\" rel=\"noopener noreferrer\">multi-cloud should be considered<\/a>. Yes, it&rsquo;s more complex to manage; yes, you&rsquo;ll have to train more people. But if your company is going to go bust because of a few hours of outage, it is an investment worth making. It appears some hyperscalers are more resilient than others, but even the best are likely to slip up occasionally.&rdquo; <\/p>\n<\/p><\/div>\n","protected":false},"author":415,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-39052","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/39052","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/users\/415"}],"replies":[{"embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/comments?post=39052"}],"version-history":[{"count":3,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/39052\/revisions"}],"predecessor-version":[{"id":39068,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/39052\/revisions\/39068"}],"wp:attachment":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/media?parent=39052"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/categories?post=39052"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/tags?post=39052"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}