Category Archives: Outage

Broadband, BT, Equinix, News & Analysis, Outage

BT outage impacts 10% of customers in capital

July 20, 2016 Jamie Davies

BT has confirmed around 10% of its customers experienced an outage this morning, which has reportedly been linked to a power incident at the former Telecity LD8 site in London, which is now owned by Equinix, reports Telecoms.com.

BT first acknowledged the outage this morning on Twitter, which took down broadband services for a number of customers in the London area.

The LD8 data centre in London’s Docklands currently houses the London Internet Exchange (LINX), one of the world’s largest Internet Exchanges with more than 700 members which include ISPs such as BT and Virgin Media, as well as content providers.

“We’re sorry that some BT and Plusnet customers experienced problems accessing some internet services this morning,” said a BT spokesperson. “Around 10% of customers’ internet usage was affected following power issues at one of our internet connection partners’ sites in London. The issue has now been fixed and services have been restored.”

While the comment has stated the problem was limited to London, BT’s service status page does indicate dozens of cities and towns across the UK experienced issues. These service challenges have not been directly linked to the same incident to date.

The LD8 data centre has been under control of Equinix over recent months since the US company acquired Telecity for $3.8 billion. Equinix claims it is now the largest retail colocation provider in Europe and globally, after the deal added 34 data centres to the portfolio, though eight assets had to be off-loaded to keep competition powers in the European Commission happy.

“Equinix can confirm that we experienced a brief outage at the former Telecity LD8 site in London earlier this morning,” said a Equinix spokesperson. “This impacted a limited number of customers, however service was restored within minutes. Equinix engineers are on site and actively working with customers to minimise the impact.”

During email exchanges with Telecoms.com, neither BT or Equinix named either party, though this is understandable as it is a sensitive issue. Despite BT stating all services have been recovered at the time of writing the service status page lists dozens of towns and cities who are still experiencing problems. Although not directly linked, as long as service problems continue BT is likely to be facing a mounting customer service challenge.

AWS, News & Analysis, Outage, Public Cloud

AWS release statement to explain Aussie outage

June 9, 2016 Jamie Davies

AWS has blamed a power shortage caused by adverse weather conditions as the primary cause of the outage Australian customers experienced this weekend.

A statement on the company’s website stated its utility provider suffered a failure at the regional substation, which resulted in the total loss of utility power to multiple AWS facilities. At one of these facilities, the power redundancy didn’t work as designed and the company lost power to a large number of instances in the availability zone.

The storm this weekend was one of the worst experienced by Sydney in recent years, recording 150mm of rain over the period, with 93 mm falling on Sunday 5^th alone, and wind speeds reaching as high as 96 km/h. The storm resulted in AWS customers losing services for up to six hours, between 11.30pm and 4.30am (PST) on June 4/5. The company claims over 80% of the impacted customer instances and volumes were back online and operational by 1am, though a latent bug in the instance management software led to a slower than expected recovery for some of the services.

While adverse weather conditions cannot be avoided, the outage is unlikely to ease concerns over public cloud propositions. Although the concept of cloud may now be considered mainstream, there are still numerous decision makers who are hesitant over placing mission critical workloads in such an environment, as it has been considered as handing control of a company’s assets to another organization. Such outages will not bolster confidence in those who are already pessimistic.

“Normally, when utility power fails, electrical load is maintained by multiple layers of power redundancy,” the statement said. “Every instance is served by two independent power delivery line-ups, each providing access to utility power, uninterruptable power supplies (UPSs), and back-up power from generators. If either of these independent power line-ups provides power, the instance will maintain availability. During this weekend’s event, the instances that lost power lost access to both their primary and secondary power as several of our power delivery line-ups failed to transfer load to their generators.”

In efforts to avoid similar episodes in the future, the team have stated additional breakers will be added to assure that we more quickly break connections to degraded utility power to allow generators to activate before uninterruptable power supplies systems are depleted. The team have also prioritized reviewing and redesigning the power configuration process in their facilities to prevent similar power sags from affecting performance in the future.

“We are never satisfied with operational performance that is anything less than perfect, and we will do everything we can to learn from this event and use it to drive improvement across our services,” the company said.

Google, News & Analysis, Outage, Public Cloud

Google cloud team launches damage control mission

April 14, 2016 Jamie Davies

Google will offer all customers who were affected by the Google Compute Engine outage with service credits, in what would appear to be a damage control exercise as the company looks to gain ground on AWS and Microsoft Azure in the public cloud market segment.

On Monday, 11 April, Google Compute Engine instances in all regions lost external connectivity for a total of 18 minutes. The outage has been blamed on two separate bugs, which separately would not have caused any major problems, though the combined result was a service outage. Although the outage has seemingly caused embarrassment for the company, it did not impact other more visible, consumer services such as Google Maps or Gmail.

“We recognize the severity of this outage, and we apologize to all of our customers for allowing it to occur,” said Benjamin Treynor Sloss, VP of Engineering at Google, in a statement on the company’s blog. “As of this writing, the root cause of the outage is fully understood and GCE is not at risk of a recurrence. Additionally, our engineering teams will be working over the next several weeks on a broad array of prevention, detection and mitigation systems intended to add additional defence in depth to our existing production safeguards.

“We take all outages seriously, but we are particularly concerned with outages which affect multiple zones simultaneously because it is difficult for our customers to mitigate the effect of such outages. It is our hope that, by being transparent and providing considerable detail, we both help you to build more reliable services and we demonstrate our ongoing commitment to offering you a reliable Google Cloud platform.”

While the outage would not appear to have caused any major damage for the company, competitors in the space may secretly be pleased with the level of publicity the incident has received. Google has been ramping up efforts in recent months to bolster its cloud computing capabilities to tackle the public cloud market segment with hires of industry hard-hitters, for instance Diane Greene, rumoured acquisitions, as well as announcing plans to open 12 new data centres by the end of 2017.

The company currently sits in third place in the public cloud market segment, behind AWS and Microsoft Azure, though has been demonstrating healthy growth in recent months prior to the outage.

New York Times, nytimes.com, Outage, Outage Alert

New York Times Outage: Rumors of DDoS Attack [Outage Alert]

August 14, 2013 Richard

nytimes.com is down, including mobile apps. Some on Twitter suggest the outage is as a result of a cyberattack.

@nytimes on Twitter is for now sticking to “Technical Difficulties”

Google, Google Apps, Outage, Outage Alert

Find Out Which Google Services Were Down Today

April 17, 2013 Richard

Seven Google services were down part of today. Take a look at the apps status dashboard for details. As of this writing the Admin Control Panel/API still shows a service outage symbol but the details indicate the problem has been resolved.

Archiving & Regulatory Retention, Google, Outage, Outage Alert, Postini

Google Outages: Did the Latest Hit You?

March 26, 2013 Richard

This time it was Postini:

March 25, 2013 1:38:00 PM PDT

We’re investigating reports of an issue with Postini Services.

March 25, 2013 2:38:00 PM PDT

Postini Services service has already been restored for some users, and we expect a resolution for all users within the next 1 hours. Please note this time frame is an estimate and may change. (editor’s note: resolution took over six more hours).

March 25, 2013 9:05:00 PM PDT

The problem with Postini Services should be resolved. We apologize for the inconvenience and thank you for your patience and continued support. Please rest assured that system reliability is a top priority at Google, and we are making continuous improvements to make our systems better.

Google, Google Docs, Google Drive, Outage, Outage Alert, Storage

Google Drive Problems Three of Last Four Days

March 21, 2013 Richard

Google Drive is stalled again for some users:

March 21, 2013 7:07:00 AM PDT

We are continuing to investigate this issue. We will provide an update by March 21, 2013 8:07:00 AM PDT detailing when we expect to resolve the problem.

Users are able to access Drive, but they may experience slow behavior or sporadic errors.

ICLOUD PE