[session] Open Source Tool Chains | @DevOpsSummit @CAinc #CD #DevOps #ContinuousTesting

The goal of Continuous Testing is to shift testing left to find defects earlier and release software faster. This can be achieved by integrating a set of open source functional and performance testing tools in the early stages of your software delivery lifecycle. There is one process that binds all application delivery stages together into one well-orchestrated machine: Continuous Testing. Continuous Testing is the conveyer belt between the Software Factory and production stages. Artifacts are moved from one stage to the next only after they have been tested and approved to continue. New code submitted to the repository is tested upon commit. When tests fail, the code is rejected. Subsystems are approved as part of periodic builds on their way to the delivery stage, where the system is being tested as production ready. The release process stops when tests fail. The key is to shift test creation and execution to the left, rather than creating tests after development is complete. As code is committed and promoted, all tests run in the background near instantaneously, as there is no longer time for human intervention in a continuous deployment cycle.

read more

API Security: OWASP 2017 RC1 Gets It Right | @CloudExpo #API #Cloud #Microservices

API Security has finally entered our security zeitgeist. OWASP Top 10 2017 – RC1 recognized API Security as a first class citizen by adding it as number 10, or A-10 on its list of web application vulnerabilities. We believe this is just the start. The attack surface area offered by API is orders or magnitude larger than any other attack surface area. Consider the fact the APIs expose cloud services, internal databases, application and even legacy mainframes over the internet. What could go wrong?

read more

[slides] End-User Experience for Multi-Cloud Flexibility | @CloudExpo @Cedexis #Agile #DevOps #DataCenter

Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chief Architect at Cedexis, covered strategies for orchestrating global traffic achieving the highest-quality end-user experience while spanning multiple clouds and data centers and reacting at the velocity of modern development teams.

read more

Containers: The road ahead to enterprise adoption

It is indisputable that containers are one of the hottest tickets in open source technology, with 451 Research projecting more than 250% growth in the market from 2016 to 2020. It’s easy to see why, when container technology has the ability to combine speed and density with the security of traditional virtual machines and requires far smaller operating systems in order to run.

Of course, it’s still early days and similar question marks faced OpenStack technology on its path to market maturity and widespread revenue generation. Customers are still asking “can any of this container stuff actually be used securely in production for an enterprise environment?”

From virtual machines to containers

First, some background. In previous years, virtual machines have been able to provide a solution to workflow expansion and cost reduction for many companies, but they do have limits. For example, virtual machines have a far smaller capacity compared to containers, in terms of the amount of applications they are able to put into a single physical server. Virtual machines also use up system resources; each virtual machine runs on a full copy of an operating system, as well as a virtual copy of all the hardware that the operating system needs in order to function.

Containers offer a new form of virtualisation, providing almost equivalent levels of resource isolation as a traditional hypervisor. However, containers present lower overhead both in terms of lower memory footprint and higher efficiency. This means that higher density can be achieved – simply put, you can get more for the same hardware.

Enterprise adoption

The telco industry has been at the bleeding edge of adopting container technology. Part of the catalyst for this trend has been the NFV (network function virtualisation) revolution – the concept of telcos shifting what were traditionally welded-shut proprietary hardware appliances into virtual machines.

We certainly do see virtual machines being used in production in some telcos, but containers are actually a stronger fit in some cases; the performance is even better when it comes to NFV applications.

Developers in enterprise environments are aware that containers offer both higher performance to the end user, as well as operational efficiency for the cloud administrator. However, many CIOs are still unsure that containers are the best option of technology for them, due to wider market misconceptions. For example, some believe that by using one particular type of container, they are going to tie themselves into a specific vendor.

Security worries

Another common misconception that might present an obstacle to enterprise adoption is the concept of security. However, there are several controls in place that enable us to say, with confidence, that an LXD container is more than secure enough to satisfy the CIO that is, understandably, more security-conscious than ever.

One of these is resource control, which, inside of a Linux kernel, is provided by a technology called cgroups (control groups), originally engineered at Google in 2006. Cgroups is the fundamental technology inside of a Linux kernel that groups processes in a certain way, ensuring that those processes are tightly coupled. This is essentially what a Docker or LXD container is – an illusion that the Linux kernel creates around the group of processes that makes them look like they belong together.

Within LXD and Docker, cgroups allows you to assign certain limiting parameters, for example, CPU, disk storage or throughput. Therefore, you can keep one container from taking all of the resources away from other containers. From a security perspective, this is what ensures that a given container cannot perform a denial of service (DDoS) attack against other containers alongside it, thereby providing quality of service guarantees.

Mandatory access control (MAC) also ensures that neither the container code itself, nor the code run within the containers, has a greater degree of access than the process itself requires, so the privileges granted to rogue or compromised process are minimised.

In essence, the greatest security strength of containers is isolation. Container technology can offer hardware-guaranteed security to ensure that each containerised machine cannot access one another. There may be situations where a virtual machine is required for particularly sensitive data, but for the most part containers deliver security. In fact, Canonical designed LXD from day one with security in mind.

IoT is leading the way

Many of the notable trends dominating the tech news agenda in the last couple of years, particularly the Internet of Things, are pushing the shift towards enterprise adoption of containers. Container technology is arguably the ideal response to scalability and data-related issues presented by the predominance of IoT applications.

Containers, in tandem with edge computing, are optimised for enabling the transmission of data between connected devices and the cloud. Harvesting data from any number of remote devices and processing it calls for extreme scaling. Application containers, with the help of tools such as Docker and Ubuntu Core, which runs app packages for IoT known as “snaps”, can help provide this.

Why containers?

Container technology has brought about a step-change in virtualisation technology. Organisations implementing containers see considerable opportunities to improve agility, efficiency, speed, and manageability within their IT environments. Containers promise to improve data centre efficiency and performance without having to make additional investments in hardware or infrastructure.

For Linux-on-Linux workloads containers can offer a faster, more efficient and cost-effective way to create an infrastructure. Companies using these technologies can take advantage of brand-new code, written using modern advances in technology and development discipline.

We see a lot of small to medium organisations adopting container technology as they develop from scratch, but established enterprises of all sizes, and in all industries, can channel this spirit of disruption to keep up with the more agile and scalable new kids on the block. 

Why Netflix is the ideal blueprint for cloud-native computing

The uber poster child of migrating legacy applications and IT systems via the ‘cloud native’ approach is Netflix. Not only do they share their best practices via blogs, they also share the software they’ve created to make it possible via open source.

Migrating to web-scale IT

In a VentureBeat article the author envisions ‘the future of enterprise tech’. They describe how pioneering organisations like Netflix are entirely embracing a cloud paradigm for their business, moving away from the traditional approach of owning and operating their own data centre, populated by EMC, Oracle and VMware.

Instead they are moving to ‘web scale IT’ via on demand rental of containers, commodity hardware and NoSQL databases, but critically, it’s not just about swapping out the infrastructure components.

Cloud migration best practices

In this blog they focus on the migration of the core Netflix billing systems from their own data centre to AWS, and from Oracle to a Cassandra/MySQL combination, emphasising in particular the scale and complexity of this database migration part of the cloud migration journey.

This inital quote from the Netflix blog sets the scene accordingly:

“On January 4, 2016, right before Netflix expanded itself into 130 new countries, Netflix Billing infrastructure became 100% AWS cloud-native.”

They also reference a previous blog also describing this overall AWS journey, again quickly making the most incisive point – this time describing the primary inflection point in CIO decision making that this shift represents, a move to ‘Web Scale IT‘:

“That is when we realised that we had to move away from vertically scaled single points of failure, like relational databases in our data centre, towards highly reliable, horizontally scalable, distributed systems in the cloud.”

Cloud migration: Migrating mission-critical systems

They then go on to explain their experiences of a complex migration of highly sensitive, operational customer systems from their own data centre to AWS.

As you might imagine the core customer billing systems are the backbone of a digital delivery business like Netflix, handling everything from billing transactions through reporting feeds for SOX compliance, and face a ‘change the tyre while the car is still moving’ challenge of keeping front-facing systems available and consistent to ensure unbroken service for a globally expanding audience, while conducting a background process of migrating terabytes of data from on-site enterprise databases into the AWS service.

  • We had billions of rows of data, constantly changing and composed of all the historical data since Netflix’s inception in 1997. It was growing every single minute in our large shared database on Oracle. To move all this data over to AWS, we needed to first transport and synchronise the data in real time, into a double digit Terabyte RDBMS in cloud.
  • Being a SOX system added another layer of complexity, since all the migration and tooling needed to adhere to our SOX processes.
  • Netflix was launching in many new countries and marching towards being global soon.
  • Billing migration needed to happen without adversely impacting other teams that were busy with their own migration and global launch milestones.

The scope of data migration and the real-time requirements highlight the challenging nature of Cloud Migrations, and how it goes far beyond a simple lift and shift of an application from one operating environment to another.

Database modernisation

The backbone of the challenge was how much code and data was interacting with Oracle, and so their goal was to ‘disintegrate’ that dependency into a services based architecture.

“Moving a database needs its own strategic planning:

Database movement needs to be planned out while keeping the end goal in sight, or else it can go very wrong. There are many decisions to be made, from storage prediction to absorbing at least a year’s worth of growth in data that translates into number of instances needed, licensing costs for both production and test environments, using RDS services vs. managing larger EC2 instances, ensuring that database architecture can address scalability, availability and reliability of data. Creating disaster recovery plan, planning minimal migration downtime possible and the list goes on. As part of this migration, we decided to migrate from licenced Oracle to open source MYSQL database running on Netflix managed EC2 instances.”

Overall this transformation scope and exercise included:

  • APIs and integrations: The legacy billing systems ran via batch job updates, integrating messaging updates from services such as gift cards, and billing APIs are also fundamental to customer workflows such as signups, cancellations or address changes.
  • Globalisation: Some of the APIs needed to be multi-region and highly available, so data was split into multiple Cassandra data stores. A data migration tool was written that transformed member billing attributes spread across many tables in oracle into a much smaller Cassandra structure.
  • ACID: Payment processing needed ACID transaction, and so was migrated to MySQL. Netflix worked with the AWS team to develop a multi-region, scalable architecture for their MySQL master with DRBD copy and multiple read replicas available in different regions, with toolingn and alerts for MySQL instances to ensure monitoring and recovery as needed.
  • Data/code purging: To optimise how much data needed migrated, the team conducted a review with business teams to identify what data was still actually live, and from that review purged many unnecessary and obsolete data sets. As part of this housekeeping obsolete code was also identified and removed.

A headline challenge was the real-time aspect, ‘changing the tyre of the moving car’, migrating data to MySQL that is constantly changing. This was achieved through Oracle GoldenGate, which could replicate their tables across heterogeneous databases, along with ongoing incremental changes. It took a heavy testing period of two months to complete the migration via this approach.

Downtime switchover

Downtime was needed for this scale of data migration, and to mitigate impact for users Netflix employed an approach of ‘decoupling user facing flows to shield customer experience from downtimes or other migration impacts’.

All of their tooling was built around ability to migrate a country at a time and funnel traffic as needed. They worked with ecommerce and membership services to change integration in user workflows to an asynchronous model, building retry capabilities to rerun failed processing and repeat as needed.

An absolute requirement was SOX Compliance, and for this Netflix made use of components available in their OSS open source suite:

“Our cloud deployment tool Spinnaker was enhanced to capture details of deployment and pipe events to Chronos and our big data platform for auditability. We needed to enhance Cassandra client for authentication and auditable actions. We wrote new alerts using Atlas that would help us in monitoring our applications and data in the cloud.”

Building high availability, globally distributed cloud applications with AWS

Netflix provides a detailed, repeatable best practice case study for implementing AWS cloud services, at an extremely large scale, and so is an ideal baseline candidate for any enterprise organisation considering the same types of scale challenges, especially with an emphasis on HA – High Availability.

Two Netflix presentations: Globally Distributed Cloud Applications, and From Clouds to Roots provide a broad and deep review of their overall global architecture approach, in terms of exploiting AWS with the largest and most demanding of of capacity and growth requirements, such as hosting tens of thousands of virtual server instances to operate the Netflix service, auto-scaling by 3k/day.

This goes into a granular level of detail of how they monitor performance, and then additionally in they focus specifically on High Availability Architecture, providing a broad and deep blueprint for this scenario requirements.

Netflix Spinnaker – global continuous delivery

In short these address the two core, common requirements of enterprise organisations, their global footprint and associated application hosting and content delivery requirements, and also their own software development practices – How better can they optimise the IT and innovation processes that deploys the software systems that needs this infrastructure.

Build code like Netflix – continuous deployment

The ideal of our ‘repo guide’ for the Netflix OSS suite is for it to function as a ‘recipe’ for others to follow, ie You too can Build Code Like Netflix.

Therefore it’s apt one of the best starting points is their blog with the same title – How We Build Code At Netflix.

Most notably because this introduces the role of Continuous Deployment best practices, and how one of their modules ‘Spinnaker‘ is central to this.

Cloud native toolchain

In this blog Global Continuous Delivery With Spinnaker they explain how it addresses this scope of the code development lifecycle, across global teams, and forms the backbone of their DevOps ‘toolchain’, integrating with other tools such as Git, Nebula, Jenkins and Bakery.

As they describe:

“Spinnaker is an open source multi-cloud continuous delivery platform for releasing software changes with high velocity and confidence. Spinnaker is designed with pluggability in mind; the platform aims to make it easy to extend and enhance cloud deployment models.”

Their own quoted inspirations include Jez Humble’s blog and book on Continuous Delivery, as well as experts such as Martin Fowler and working ideals such as ‘Blue Green Deployments‘.

Moving from Asgard

Their history leading up to the conception and deployment of Spinnaker is helpful reading too; previously they utilised a tool called ‘Asgard’, and in Moving from Asgard:, describe the limitations they reached using that type of tool, and how instead they sought a new tool that could achieve:

  • “enable repeatable automated deployments captured as flexible pipelines and configurable pipeline stages
  • provide a global view across all the environments that an application passes through in its deployment pipeline
  • offer programmatic configuration and execution via a consistent and reliable API
  • be easy to configure, maintain, and extend”

These requirements formed into Spinnaker and the deployment practices they describe, which you can repeat through the Github Download.

[session] WebRTC Potential for Edge Computing | @ThingsExpo @NTTCom #IoT #M2M #RTC WebRTC

Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding – video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service.
However, the benefit of WebRTC for IoT is not only its convenience and interoperability. It has lots of potential to address current issues around IoT – security, connectivity and so on – based on P2P technology. It will become a key-component especially in edge computing use cases, in his view.

read more

[session] DNS, #DevOps and #DigitalTransformation | @CloudExpo #DX #SaaS

In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to iterate use cases, bring understanding to those seeking to explore complicated technical concepts and increase revenue across diverse sales channels.

read more

[session] Dovetailing #DevOps and the Cloud | @DevOpsSummit @CAinc @Aruna13 #CloudNative

As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory?

read more

[session] #DevOps vs SRE vs Cloud Native | @DevOpsSummit @RackNgo #Serverless #CloudNative

DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE).
Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and controlling infrastructure. The rise of Site Reliability Engineering (SRE) is part of that redefinition of operations vs development roles in organizations.

read more

[session] #Serverless Computing for #IoT Device Simulation | @ThingsExpo #AI #DX #SmartCities

When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be looking at some significant engineering investment.
On-demand, serverless computing enables developers to try out a fleet of devices on IoT gateways with ease. With a sensor simulator built on top of AWS Lambda, it’s possible to elastically generate device sensors that report their state to the cloud.

read more