{"id":39230,"date":"2019-07-03T09:42:31","date_gmt":"2019-07-03T09:42:31","guid":{"rendered":"http:\/\/icloud.pe\/blog\/?guid=018c26946a0bb6dc518a3fb4e9ceac2d"},"modified":"2019-07-03T09:42:31","modified_gmt":"2019-07-03T09:42:31","slug":"software-glitch-to-blame-for-global-cloudflare-outage","status":"publish","type":"post","link":"https:\/\/icloud.pe\/blog\/software-glitch-to-blame-for-global-cloudflare-outage\/","title":{"rendered":"\u2018Software glitch\u2019 to blame for global Cloudflare outage"},"content":{"rendered":"<p><span class=\"field field-name-field-author field-type-node-reference field-label-hidden\"><br \/>\n      <span class=\"field-item even\"><a href=\"https:\/\/www.cloudpro.co.uk\/authors\/keumars-afifi-sabet\">Keumars Afifi-Sabet<\/a><\/span><br \/>\n  <\/span><\/p>\n<div class=\"field field-name-field-published-date field-type-datetime field-label-hidden\">\n<div class=\"field-items\">\n<div class=\"field-item even\"><span class=\"date-display-single\">3 Jul, 2019<\/span><\/div>\n<\/p><\/div>\n<\/div>\n<p class=\"short-teaser\">\n<a href=\"https:\/\/www.cloudpro.co.uk\/\" title=\"\" class=\"combined-link\"><\/a><\/p>\n<div class=\"field field-name-body\">\n<p> Cloudflare has resolved an issue that led to websites serviced by the networking and internet security firm to show 502 \u2018Bad Gateway\u2019 errors en masse for half an hour yesterday.<\/p>\n<p>From 2:42pm BST the networking giant suffered a massive spike in CPU utilisation to its network, which Cloudflare is blaming on bad software deployment. This affected websites hosted in territories across the entire world.<\/p>\n<div id=\"file-7249\" class=\"file file-image file-image-png file-content-full-width\">\n<div class=\"content\">    <img decoding=\"async\" src=\"https:\/\/cdn1.cloudpro.co.uk\/sites\/cloudprod7\/files\/styles\/insert_main_wide_image\/public\/2019\/07\/cloudfare_outage.png?itok=KAALxIia\" alt=\"\" \/>  <\/div>\n<\/div>\n<p><strong><em>Ironically, even Downdetector was knocked offline during the outage<\/em><\/strong><\/p>\n<p>Once this faulty deployment was rolled back, its <a href=\"https:\/\/blog.cloudflare.com\/cloudflare-outage\/\" >CTO John Graham-Cumming explained<\/a>, service was returned to normal operation and all domains using Cloudflare returned to normal traffic levels.<\/p>\n<p>\u201cThis was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred,\u201d Graham-Cumming said.<\/p>\n<p>\u201cInternal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again.\u201d<\/p>\n<p>The incident affected several massive industries, including cryptocurrency markets, with users not able to properly access exchanges like CoinMarketCap and CoinBase.<\/p>\n<div class=\"wysiwyg-widget-wrapper\">\n<blockquote class=\"twitter-tweet\" data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">ALERT: Due to a cloudflare outage, we&#39;re getting bad data from our providers, which is showing incorrect crypto prices. Calm down everyone, Bitcoin is not $26.<\/p>\n<p>&mdash; CoinDesk (@coindesk) <a href=\"https:\/\/twitter.com\/coindesk\/status\/1146056874988642306?ref_src=twsrc%5Etfw\">July 2, 2019<\/a><\/p>\n<\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n<p>\nCloudflare issued an update last night suggesting the global outage was caused by the deployment of just one misconfigured rule within the Cloudflare Web Application Firewall (WAF) during a routine deployment. The company had aimed to improve the blocking of inline JavaScript used in cyber attacks.<\/p>\n<p>One of the rules it deployed caused CPU to spike to 100% on its machines worldwide, and subsequently led to the 502 errors seen on sites across the world. Web traffic dropped by 82% at the worst point during the outage.<\/p>\n<div class=\"wysiwyg-widget-wrapper\">\n<blockquote class=\"twitter-tweet\" data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">Massive spike in CPU usage caused primary and backup systems to fall over. Impacted all services. No evidence yet attack related. Shut down service responsible for CPU spike and traffic back to normal levels. Digging in to root cause.<\/p>\n<p>&mdash; Matthew Prince %uD83C%uDF25 (@eastdakota) <a href=\"https:\/\/twitter.com\/eastdakota\/status\/1146065231270907907?ref_src=twsrc%5Etfw\">July 2, 2019<\/a><\/p>\n<\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n<p>\n\u201cWe were seeing an unprecedented CPU exhaustion event, which was novel for us as we had not experienced global CPU exhaustion before,\u201d Graham-Cumming continued.<\/p>\n<p>\u201cWe make software deployments constantly across the network and have automated systems to run test suites and a procedure for deploying progressively to prevent incidents.<\/p>\n<p>\u201cUnfortunately, these WAF rules were deployed globally in one go and caused today\u2019s outage.\u201d<\/p>\n<p>At 3:02pm BST the company realised what was going on and issued a global kill on the WAF Managed Rulesets which dropped CPU back to normal levels and restored traffic, before fixing the issue and re-enabling the Rulesets approximately an hour later.<\/p>\n<p>Many on social media were speculating during the outage that the 502 Bad Gateway errors may be the result of a distributed denial-of-service (DDoS) attack. However, these suggestions were <a href=\"https:\/\/twitter.com\/jgrahamc\/status\/1146078278278635520\" >fairly quickly quashed and confirmed to be untrue by the firm<\/a>. <\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p><span><br \/>\n      <span><a href=\"https:\/\/www.cloudpro.co.uk\/authors\/keumars-afifi-sabet\">Keumars Afifi-Sabet<\/a><\/span><br \/>\n  <\/span><\/p>\n<div>\n<div>\n<div><span>3 Jul, 2019<\/span><\/div>\n<\/p><\/div>\n<\/div>\n<p>\n<a href=\"https:\/\/www.cloudpro.co.uk\/\" title=\"\"><\/a><\/p>\n<div>\n<p> Cloudflare has resolved an issue that led to websites serviced by the networking and internet security firm to show 502 &lsquo;Bad Gateway&rsquo; errors en masse for half an hour yesterday.<\/p>\n<p>From 2:42pm BST the networking giant suffered a massive spike in CPU utilisation to its network, which Cloudflare is blaming on bad software deployment. This affected websites hosted in territories across the entire world.<\/p>\n<div>\n<div>    <img decoding=\"async\" src=\"https:\/\/cdn1.cloudpro.co.uk\/sites\/cloudprod7\/files\/styles\/insert_main_wide_image\/public\/2019\/07\/cloudfare_outage.png?itok=KAALxIia\" alt=\"\"><\/div>\n<\/div>\n<p><strong><em>Ironically, even Downdetector was knocked offline during the outage<\/em><\/strong><\/p>\n<p>Once this faulty deployment was rolled back, its <a href=\"https:\/\/blog.cloudflare.com\/cloudflare-outage\/\" target=\"_blank\" rel=\"noopener noreferrer\">CTO John Graham-Cumming explained<\/a>, service was returned to normal operation and all domains using Cloudflare returned to normal traffic levels.<\/p>\n<p>&ldquo;This was not an attack (as some have speculated) and we are incredibly sorry that this incident occurred,&rdquo; Graham-Cumming said.<\/p>\n<p>&ldquo;Internal teams are meeting as I write performing a full post-mortem to understand how this occurred and how we prevent this from ever occurring again.&rdquo;<\/p>\n<p>The incident affected several massive industries, including cryptocurrency markets, with users not able to properly access exchanges like CoinMarketCap and CoinBase.<\/p>\n<div>\n<blockquote data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">ALERT: Due to a cloudflare outage, we&#8217;re getting bad data from our providers, which is showing incorrect crypto prices. Calm down everyone, Bitcoin is not $26.<\/p>\n<p>&mdash; CoinDesk (@coindesk) <a href=\"https:\/\/twitter.com\/coindesk\/status\/1146056874988642306?ref_src=twsrc%5Etfw\">July 2, 2019<\/a><\/p>\n<\/blockquote>\n<\/div>\n<p>\nCloudflare issued an update last night suggesting the global outage was caused by the deployment of just one misconfigured rule within the Cloudflare Web Application Firewall (WAF) during a routine deployment. The company had aimed to improve the blocking of inline JavaScript used in cyber attacks.<\/p>\n<p>One of the rules it deployed caused CPU to spike to 100% on its machines worldwide, and subsequently led to the 502 errors seen on sites across the world. Web traffic dropped by 82% at the worst point during the outage.<\/p>\n<div>\n<blockquote data-lang=\"en\">\n<p lang=\"en\" dir=\"ltr\">Massive spike in CPU usage caused primary and backup systems to fall over. Impacted all services. No evidence yet attack related. Shut down service responsible for CPU spike and traffic back to normal levels. Digging in to root cause.<\/p>\n<p>&mdash; Matthew Prince %uD83C%uDF25 (@eastdakota) <a href=\"https:\/\/twitter.com\/eastdakota\/status\/1146065231270907907?ref_src=twsrc%5Etfw\">July 2, 2019<\/a><\/p>\n<\/blockquote>\n<\/div>\n<p>\n&ldquo;We were seeing an unprecedented CPU exhaustion event, which was novel for us as we had not experienced global CPU exhaustion before,&rdquo; Graham-Cumming continued.<\/p>\n<p>&ldquo;We make software deployments constantly across the network and have automated systems to run test suites and a procedure for deploying progressively to prevent incidents.<\/p>\n<p>&ldquo;Unfortunately, these WAF rules were deployed globally in one go and caused today&rsquo;s outage.&rdquo;<\/p>\n<p>At 3:02pm BST the company realised what was going on and issued a global kill on the WAF Managed Rulesets which dropped CPU back to normal levels and restored traffic, before fixing the issue and re-enabling the Rulesets approximately an hour later.<\/p>\n<p>Many on social media were speculating during the outage that the 502 Bad Gateway errors may be the result of a distributed denial-of-service (DDoS) attack. However, these suggestions were <a href=\"https:\/\/twitter.com\/jgrahamc\/status\/1146078278278635520\" target=\"_blank\" rel=\"noopener noreferrer\">fairly quickly quashed and confirmed to be untrue by the firm<\/a>. <\/p>\n<\/p><\/div>\n","protected":false},"author":433,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-39230","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/39230","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/users\/433"}],"replies":[{"embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/comments?post=39230"}],"version-history":[{"count":2,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/39230\/revisions"}],"predecessor-version":[{"id":39242,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/posts\/39230\/revisions\/39242"}],"wp:attachment":[{"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/media?parent=39230"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/categories?post=39230"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/icloud.pe\/blog\/wp-json\/wp\/v2\/tags?post=39230"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}