All posts by bryansolie

A guide to planning for application resiliency in cloud environments

(c)iStock.com/fazon1

As businesses look to clouds for faster, more flexible growth, they confront significant challenges from a legacy application base that has varying levels of cloud suitability. Here, we examine how requirements around fault tolerance and disaster recovery can impact choice of cloud or architecture strategies within a cloud. 

Planning for application resiliency in cloud environments can present special challenges. Strategies can be similar to those used in traditional data centres, but implementations often differ.

At the base of the implementation differences is the architecture typically chosen for cloud applications.  Clouds tend to favour scaling “out” to more nodes rather than “up” to a bigger node.  This choice enables more graceful degradation in the event of node failure.  It also allows developers to add capacity in smaller units that can be finely tuned to immediate requirements, avoiding larger buys and attendant unused capacity.  Scaling out does, though, present different requirements for high availability.

In cases where services are housed on equipment that is physically proximate, like a traditional data centre, strategies like virtual IPs and load balancers often suffice to manage even scale-out infrastructures.  Planning for availability and resilience across multiple geographies, though, can require detailed consideration and engineering around managing DNS services and sessions, request routing, and persistent storage management.  Cloud providers and implementations will vary in terms of providing services to support these requirements.

Typical tiered applications or services (or microservices) rely on a core of persistent data stores, layers of business and application logic to manipulate or communicate that data, and presentation layers presenting an interface to users or applications that can execute the business logic. Distributing these layers across multiple pieces of hardware typically involves detailed planning around state management, load balancing, and latencies. Caching layers are often intermingled with the core functional layers to drive more responsiveness out of the system, and these caches have their own requirements for distributed consistency and state management.

The core persistent data stores are particularly challenging with respect to resiliency and high availability. While databases implemented on physically proximate equipment have well understood clustering solutions that retain transactional integrity by synchronising duplicate data stores, distributed large-scale databases often require more thoughtful design.  This can range from asynchronous replication of data to avoid latency in the transaction flow to data partitioning and the adoption of an “eventually consistent” paradigm for the underlying data. The specific s of the solution will depend on the application design and any requirements to limit data loss (Recovery Point Objective or RPO), but there are well understood engineering patterns that accommodate common needs.

A larger concern with distributed systems resiliency is organisational.  All infrastructure environments manage multi-layered resilience complexity with a mix of vendor and in-house engineering.  A typical non-cloud environment can leverage a more mature marketplace for vendor products and services facilitating the various layers. Resiliency in cloud environments may require more in-house engineering and less mature technologies to meet performance and availability goals for the applications or services. This often entails additional risk or organisational change to support the application.  The trend toward “devops,” creating a more synergistic relationship between applications engineering and systems administrators is one key indicator of how these changes are playing out in the enterprise.   

While moving applications into a private or public cloud environment may present an opportunity to save costs or improve operations, applications vary in their suitability for cloud infrastructures.  Some architectures (web farms, application server clusters, et al) are similar to cloud native best practice, and require little retooling to allow for resiliency.  More complex patterns are also manageable with proper planning, design, and execution.  Evaluating applications explicitly for resiliency requirements and fit against cloud native architectural principles allows firms to take best advantage of cloud economics and efficiencies in the enterprise.

Data placement in the cloud and regulations for US financial services

(Image Credit: iStockPhoto/Cronislaw)

As momentum builds behind public cloud infrastructure solutions, even highly regulated industries like financial services are exploring their options. While regulations and security are often seen as stumbling blocks for public cloud acceptance in financial services, reviewing available information on U.S. regulatory guidance and privacy law suggests there may be ways to move into the cloud in compliance-friendly ways. This article, of course, should not replace input from appropriate legal advisors. The research behind this article focused on privacy law and requirements around U.S. banks rather than international issues that might arise from data placement in the cloud. 

The U.S. banking industry has several regulatory bodies, including the Federal Reserve Board, the Federal Deposit Insurance Corporation, The National Credit Union Administration, the Office of the Comptroller of the Currency, and the Consumer Financial Protection Bureau. A consortium group called the Federal Financial Institutions Examination Council maintains a clearing house for guidance and regulatory information from the various regulators. On July 10, 2012, the Council issued a press release with guidance on public cloud utilization in banking. 

The core of the guidance is public cloud risk management should follow the same risk management principles as any outsourcing contract.  “The Federal Financial Institution Examination Council Agencies consider cloud computing to be another form of outsourcing with the same basic risk characteristics and risk management requirements as traditional forms of outsourcing.”

The Council calls out some specific areas for attention by regulated entities.

  • Due diligence: Outsourcing to cloud doesn’t excuse entities from due diligence around data management. Good data practices like data classification, segregation, and ensuring recoverability should be followed.
  • Vendor management: Cloud services unfamiliar with finance may require additional controls to provide appropriate checks and oversight.
  • Audit: Cloud services must preserve transparency and regulators’ ability to audit.
  • Information security: Information security analogous or consistent with internal practice must be maintained
  • Legal, regulatory, and reputational considerations: Applicable law must be observed
  • Business continuity planning: Appropriate plans around disaster recovery and data recoverability must be in place.

The Council makes special reference to legal considerations, and it is worth a deeper exploration of what those are. The primary privacy regulations affecting U.S. financial services are contained in the Gramm-Leach-Bliley Act (GLB).  GLB originated as a response to concerns about banks sharing detailed account-holder information with third parties for marketing or cross-selling purposes. 

The act basically specifies handling requirements around data deemed personally identifiable and non-public. It bars sharing such data with third parties without explicit agreement from customers. GLB applies to a broad range of U.S. financial services including banks, mortgage originators and servicers, consumer credit agencies, et al.

Because outsourcing infrastructure provisioning to a public cloud is, almost by definition, allowing 3rd party access to data, some specifics about GLB merit a closer look.  Fundamentally, the law requires financial services firms to preserve the privacy of “non-public personal information” by enacting a variety of personnel and computer systems security policies. In the event a firm wants or needs to share information with a third party, the customer must be notified and given the opportunity to opt out.

Privacy obligations only apply to non-public personal information. This might include account numbers, or information from an application for an account–even information in an internet cookie is considered protected. What is not protected is non-identifiable aggregate data or information that can be gleaned from other public records (mortgage holders are often recorded in deed documents, for instance).  

As to how to protect that information, firms (and by extension, their 3rd party outsource partners) must implement a comprehensive security program to ensure security and confidentiality of protected data, protecting against anticipated threats or unauthorized access. Such a program might be composed of employee training and management as well as information systems management.

Background checks and restriction of data access to those employees with a need to know are a cornerstone of secure data management.  Other employee-focused policies are strong, frequently changed passwords, remote access restrictions, and prompt deactivation of departing personnel’s passwords and user names.

A security policy might also include procedures around information systems. Appropriate encryption, security patching, firewalls, intrusion detection systems, anti-virus, and other prophylactic security measures must be used.

When dealing with service providers, it is important to have agreements about these policies in writing to safeguard non-public Information appropriately. Additionally, you should review policies and procedures with audits, testing, or other controls to monitor the provider’s compliance.

While these steps are unmistakably a burden, they are challenges most firms have shouldered in other outsourcing arrangements.

In summary, while there remains a lot of fear, uncertainty, and doubt around regulatory discussions with financial services in the public cloud, a review of the available information is not as forbidding as one might suppose. The guidance points to outsourcing as the model, and there is ample precedent and experience with outsourcing systems. U.S. financial firms hoping to take advantage of the speed, flexibility, and cost efficiencies of the public cloud should proceed, with caution, of course, but optimistically.

What advice would you share on the matter? Let us know in the comments.

Analysing security and regulatory concerns with cloud app migration

(c)iStock.com/baluzek

As businesses look to clouds for faster, more flexible growth, they confront significant challenges from a legacy application base that has varying levels of cloud suitability. It is therefore worth examining how security and compliance requirements can impact choice of cloud or architecture strategies within a cloud. 

Maintaining privacy for data that must be protected for business or regulatory reasons is a substantial concern in cloud migration, public or private. Security and compliance organisations vary in their acceptance of shared environments, even within a single organisation.

Public clouds particularly can highlight weaknesses in data security practices. As firms consider moving applications into a public cloud it is important to be sure basic policies and procedures are in place.  Data classification is a core practice that enables risk-aware protection of data as it moves to the cloud. Having a formal classification system defining what data is public, company confidential, or eyes only is the first step to being able to track compliance in an explicit way. Data life cycle management, decommissioning and disposal in particular, gain increased importance in public cloud.  Encryption and key management have new wrinkles when infrastructure is owned by outside parties.

In addition to basic security concerns, legal regulatory issues for data management must also be considered. Distributing data in any cloud environment, public or private, can trip over legal restrictions in global enterprises. It is important to establish what’s permissible in the jurisdictions an application is operating in, and preventing cross-border flows of data is sometimes required. Similarly, for international applications, it’s important to be aware of any requirements for electronic discovery, integrity, and data custody in the countries where data exists.  Firms considering moving applications into the public cloud should require their providers to detail the legal requests the provider will respond to and the manner in which they will respond.

In the US, domestic regulatory concerns can also come into play with shared service environments.  They can vary depending on the particular standards being enforced.  Two common U.S. examples are information protected by the Health Insurance Portability and Accountability Act (HIPAA) and data segregation required to maintain credit card privacy: the Payment Card Industry Data Security Standards (PCI-DSS).

HIPAA’s primary focus is preserving administrative security and integrity controls around protected data.  Core areas of concern are access control, audit points, encryption of data at rest and in flight. The implementation of each of these varies based on type of cloud model implemented.  Virtualisation introduces additional control points and risks. Audit logs should be continuously reviewed for unwarranted access in any shared service environment. 

There are vendor solutions that can help: role-based access control ensuring administrators have the least privilege possible for accomplishing their tasks, cloud volume encryption methods that render keys unreadable to public cloud service providers. But planning and awareness are imperative to implementing protection that meets compliance obligations without being operationally burdensome.

While HIPAA emphasises only administrative data access, PCI-DSS adds the burden of technical segregation of protected data.  In virtualised, shared environments, this can lead to difficult tradeoffs and increased diligence around audit readiness.

A virtual machine that contains data that is in scope for PCI-DSS compliance renders the hosting hypervisor in scope as well as any virtual machines hosted by the same hypervisor.  That means all the shared VMs must be treated with the same strict security management as those with PCI-DSS protected data, even if no PCI-DSS protected data exists on the shared VMs. “Compensating controls” and additional proofs may make logical separation compliant, but there is no one-size-fits-all method to meet PCI-DSS requirements.  Specific controls and procedures will depend on how virtualisation is used and implemented.

If, in addition to sharing hardware inside an organisation, you are also sharing that hardware with other organisations in a public cloud, there is additional complexity.  Multi-tenancy, nested service-provider relationships, converged administrative responsibilities, can all expand compliance requirements in undesirable ways. Many public clouds, including Google Compute, Amazon Web Services, and Microsoft Azure, market themselves as PCI-DSS compliant, but it is important to note that the final burden of compliance for an application falls to the client organisation.  Specific responsibilities and proofs for compliance should be carefully spelled out when contracting for cloud services requiring PCI-DSS compliance.

Security and compliance requirements may be met in cloud environments, but special attention and close cooperation with a firm’s risk management organisation is called for.

In summary, while moving applications into a private or public cloud environment may present an opportunity to save costs or improve operations, applications vary in their requirements for data security, privacy and management.  The common concerns presented here can add complexity but are manageable with proper care around planning, design, and execution.  Performing detailed application analysis for cloud security requirements can create evidence-based guidance on migration costs and requirements rather than letting panic-stricken hand-waving dictate decisions around whether and what to migrate.

How performance requirements can prove a stumbling block for cloud transformation

(c)iStock.com/sndr

As businesses look to cloud for faster, more flexible growth, they confront significant challenges from a legacy application base that has varying levels of cloud suitability.

Some applications have specific kinds of performance requirements that may limit or eliminate their eligibility for virtualisation, which is fundamental to optimising cloud efficiencies.  These performance characteristics come in various flavours: A requirement for specialty hardware, a requirement for particularly extreme CPU or memory utilisation, or a requirement for real time deterministic performance.

Under the hood

The core value proposition of virtualisation and cloud is using a single standard hardware platform across a wide range of applications or services.  In some cases, though, specialised hardware might be required to most effectively provide some function.  Sophisticated maths can sometimes benefit from utilising Graphical Processing Units (GPUs), high-throughput systems benefit from solid-state disk, while cryptographic applications can use random number generators, which can be difficult to come by in virtual environments. Low latency stock trading environments often use specialised low-latency switches and network taps for operations and instrumentation.  While a specialty hardware requirement may not prevent migrating an application to a cloud environment, it may limit the vendor options or require special accommodation in the case of private clouds.

Another common obstacle to virtualisation is a requirement for raw processing horsepower. Virtualisation requires some CPU cycles to facilitate management among the various virtual machines running on the server.  Maximising the CPU available to an application means using only one virtual machine on that hardware.  At which point, the question becomes whether it is cost-effective to use a hypervisor at all.

A more subtle performance requirement that is particularly troublesome in shared service environments is deterministic performance. Some transactions or activities need to happen within a fixed – often short – amount of time.  Software defined networking (SDN) solutions, some types of live media broadcasting, big data real time analytics applications, algorithmic trading platforms, all benefit from deterministic, consistent, performance. Cloud-like, virtual, shared resource provisioning is subject to the “noisy neighbour” problem, where it’s difficult to know what other applications might be running without some workload planning and engineering. This yields unpredictable performance patterns which can impact the usability of the application.

While this issue is most obvious in multi-tenant public clouds, private clouds, where there is more knowledge of the overall workload, can be problematic as well. Mixing different operating environments, commonly done as a cost or flexibility measure, can create issues. The classic example is sharing hardware between development and QA or production.  Development virtual machines can often come and go, with different performance characteristics between each machine and its predecessor.  If hardware is shared with production, this may influence production performance. While horsepower performance may be satisfactory, the “noise” brought on by changing out virtual machines may create unacceptable inconsistency.

Various strategies can be used to manage performance capacity more actively. In a private cloud, it is possible to be selective about how VMs are packed into hardware and workloads are deployed.  In public clouds, buying larger instances might limit the number of neighbours. Another strategy is to forgo virtualisation altogether and run on hardware while trying to leverage some of the self-service and on-demand qualities of cloud computing.  The market is showing up in this area with bare metal options from providers like Rackspace and IBM. The open source cloud platform OpenStack has a sub-project that brings bare metal provisioning into the same framework as virtual machine provisioning.

Latency and congestion

Along with processing and memory, network and storage latency requirements for an application should be evaluated. Latency is the amount of time it takes a packet to traverse a network end to end. Since individual transactions can use several packets (or blocks, for storage), the round trip time can add up quickly.  While there are strategies to offset latency – TCP flow control, multi-threaded applications like web servers – some applications remain latency sensitive.

There are three different primary latency areas to examine.  First, latency between the application and the end-user or client can create performance issues.  In today’s applications, that connection is commonly a web server and browser.  Modern web servers and browsers generally use multi-threading to get multiple page components at once, so the latency issue is obscured by running multiple connections.  But there are instances where code is downloaded to the browser (Flash, Java, HTML5) that relies on single-threaded connectivity to the server.  It is also possible the communication itself may be structured in a way that exacerbates latency issues (retrieving database tables row by row, for example).  Finally, an application may have a custom client that is latency sensitive.

The second primary network latency area is storage latency, or the latency between the application and data.  This will show up when application code does not have direct local access to the back-end storage, be it a database or some other persistent store. This is a common case in cloud environments where the storage interface for the compute node tends not to be directly attached. 

As a result, there is network latency due to the storage network and latency due to the underlying storage hardware responsiveness. In public clouds, storage traffic also competes with traffic from other customers of the cloud provider. Latency can build up around any of these areas, particularly if an application uses smaller reads and writes. Writing individual transactions to database logs is a good example of this, and special attention to IO tuning for transaction logs is common. Storage latencies and bandwidth requirements can be offset by running multiple VMs, multiple storage volumes, or by using a solid state disk based service, but the choice will impact the functional and financial requirements of the cloud solution.

The third primary latency area is between cooperating applications; that is, application code may depend on some other application or service to do its job. In this case, the network latency between the applications (along with latencies in the applications themselves) can often cause slowness in the overall end-user or service delivery experience.

Depending on the specifics of the cloud solution being considered, these various latencies may be LAN, SAN, WAN, or even Internet dependent. Careful analysis of the latencies in an application and the likely impact of the planned cloud implementation is warranted. Often, as with raw performance above, consistency rather than raw speed is important.

Conclusion

In short, while moving applications into a private or public cloud environment may present an opportunity to save costs or improve operations, applications vary in their suitability for cloud infrastructures.  The common technical concerns presented here can add complexity but are manageable with proper planning, design, and execution.  Evaluating applications for cloud readiness allows evidence-based planning to take best advantage of cloud economics and efficiencies in the enterprise.