Designing new cloud architectures: Exploring CI/CD – from data centre to cloud

Today, most companies are using continuous integration and delivery (CI/CD) in one form or another – and this is of significance due to various reasons:

  • It increases the quality of the code base and the testing of that code base
  • It greatly increases team collaboration
  • It reduces the time in which new features reach the production environment
  • It reduces the number of bugs that in turn reach the production environment

Granted, these reasons apply if – and only if – CI/CD is applied with more than 70% correctness. Although there is no single perfect way of doing CI/CD, there are best practices to follow, as well as caveats to avoid in order to prevent unwanted scenarios.

Some of the problems that might arise as a consequence include: the build being broken frequently; the velocity in which new features are pushed creating havoc in the testing teams or even in the client acceptance team; features being pushed to production without proper or sufficient testing; the difficulty in tracking and even the separation of big releases; old school engineers struggling to adapt to the style.

IaaC

A few years ago, the thinking model indicated that CI/CD was only useful for the product itself; that it will only affect the development team and that operation teams were only there to support the development lifecycle. This development-centric approach suddenly came to an end when different technologies appeared, spellbinding the IT market completely. These technologies I am making reference to are those that allow to create infrastructure as code.

CI/CD is no longer exclusive to development teams. Its umbrella has expanded throughout the entirety of engineering teams, software engineers, infrastructure, network, systems engineers, and so forth.

DevOps

Nobody knows what DevOps really is, but if you are not doing, using, breathing, dreaming – being? – DevOps, you’re doing it wrong. All teasing aside, with the advent of DevOps, the gap that existed between development teams and operation teams has become closer, to the extent of some companies mixing the teams. Even so, some of those took a different approach and have multidisciplinary teams where engineers work on the product throughout the lifecycle, coding, testing and deploying – including on occasion security teams as well, now called DevOpsSec.

As the DevOps movement becomes more popular, CI/CD does as well, since it is a major component. Not doing CI/CD means not doing DevOps.

From data centre to cloud

After reducing some terms and concepts, it is clear why CI/CD is so important. Since architectures and abstraction levels change when migrating a product from data centre into the cloud, it has become necessary to evaluate what is needed in the new ecosystem for two reasons:

  • To take advantage of what the cloud has to offer, in terms of the new paradigm and the plethora of options
  • To avoid making the mistake of treating the cloud as a data centre and building everything from scratch

Necessary considerations

The CI/CD implementation to use in the cloud must fulfil the majority of the following:

  • Provided as a service: The cloud is XaaS-centric, and avoiding building things from scratch is a must. In the case of building from scratch, if it is a non in-house component, nor a value-added product feature, I would suggest a review of the architecture in addition to a logical business justification
  • Easy to get in, easy to get out: A non-complicated process of in-out means that the inner workings of the implementation are likely to be non-complicated as well. Also, in case it does not work as expected, an easy way out is always a necessity
  • Portable configuration: This is a nice to have, in order to avoid reinventing the wheel and learning a given implementation details in-depth, it is easier to move from one system to another. Typical configurations are compatible with YAML or JSON formats – however many providers allow the use of familiar language such as Python, Java or JavaScript in order to fit the customer
  • Integration with VCS as a service: This is practically a given. As an example, Bitbucket provides pipelines within a repository. AWS does it differently with CodeCommit, which provides Git repositories as a service within. Different cloud providers will employ different ways and some will integrate with external repositories as well
  • Artifact store: It depends on the type of application, but having an artefact store to store the output of the build is often a good idea. Once the delivery part is done, deploying to production is significantly easier if everything is packaged neatly
  • Statistics and metric visualisation: This is in terms of what is occurring throughout the entire pipeline, which tests are failing, which features are ready, which pipeline is having problems, analogously for the code base, and not to mention the staging/testing/UAT or similar systems prior to production
  • No hidden fees: Although the technological part is important, the financial and economic part will be so too. In cloud, the majority of things turn to OpEx, and things that are running and unused can impact greatly. In terms of pipelines, it is important to focus on the cost of build minutes per month, the cost storage of GB for VCS and artefact store, the cost per parallel pipeline, the cost of the testing infrastructure used for the given purpose, among other things. Being fully aware of minutiae and reading the fine print pays off
  • Alerts and notifications: Mainly in case of failure, but also setting minimum and maximum thresholds for number of commits, for example, can yield substantial information; no-one committing frequently to the code base may mean breaking the DevOps chain
  • Test environments easy to create/destroy: The less manual integration, the better. This needs to be automated and integrated
  • Easy ‘delivery to deployment’ integration: The signoff after the delivery stage will be a manual step, but only to afterwards trigger a set of automated steps. Long gone are the days in which an operator ran a code upgrade manually
  • Fast, error-free rollback: When problems arise after a deployment, the rollback must be easy, fast and, above all, automatic or at least semi-automatic. Human intervention at this stage is a recipe for disaster
  • Branched testing: Having a single pipeline and only performing CI/CD on the master branch is an unpopular idea – not to mention that if that is the case, breaking the build would mean affecting everyone else’s job
  • Extensive testing suite: This may not be necessarily cloud-only, but it is of significance. At minimum, four of the following must exist: unit testing, integration testing, acceptance, smoke, capacity, performance, UI/UX
  • Build environment as a service: Some cloud providers allow for virtualised environments; Bitbucket pipelines allow for integration with Docker and Docker Hub for the build environment

Monitoring, metrics, and continuous tracking of the production environment

The show is not over once deployment happens. It is at that moment, and after, when it is critical to keep track of what is occurring. Any glitch or problem can potentially snowball into an outage; thus it is important to extract as many metrics as possible and monitor as many sensors as possible without loosing track of the important things. By this, I mean establishing priorities to avoid generating chaos between engineers on-call and at desk.

Most cloud providers will provide an XaaS for monitoring, metrics, logs and alerts, plus integration with other external systems. For instance, AWS provides CloudWatch that, in turn, provides everything as a service and integrated. Google Cloud provides Stackdriver, a similar service; Microsoft has a slightly more basic service in Azure Monitor. Another giant, Alibaba, provides Cloud Monitor at a similar level as the competition. Needless to say, every major cloud provides this as a service in one level or another.

This is an essential component and must not go unnoticed – I cannot emphasise this enough. Even if the cloud does not provide a service, it must provide integration with other monitoring services from other cloud-oriented service providers, such as Dynatrace, which integrates with the most popular enterprise cloud providers.

Conclusion

CI/CD is a major component of the technology process. It can make or break your product in the cloud, and in the data centre; however, evaluating the list above when designing a new cloud architecture can save time, money and effort on a significant level.

When designing a cloud architecture, it is fundamentally important to avoid copying the current architecture, and focus the design as if the application is a cloud native application, thinking that it was born to perform in the cloud together with the entire lifecycle. As I have mentioned previously, once a first architecture is proposed and initially peer reviewed, then a list of important caveats must be brought to attention before moving onto a more solid version of the architecture.

As a final comment, doing CI/CD halfway is better than not doing it at all. Some engineers and authors may argue that it is a binary decision – either there is CI/CD or there is not. I rather think that every small improvement gained by adopting CI/CD, CI, or CD only, even in stages, is a win. In racing, whether it is by a mile or a metre, a win is a win.

Happy architecting and let us explore the cloud in depth.