All posts by Danny Bradbury

Automate your software builds with Jenkins


Danny Bradbury

3 Mar, 2021

Software developers can work well alone, if they’re in control of all their software assets and tests. Things get trickier, however, when they have to work as part of a team on a fast-moving project with lots of releases. A group of developers can contribute their code to the same source repository, like Git, but they then have to run all the necessary tests to ensure things are working smoothly. Assuming the tests pass, they must build those source files into executable binaries, and then deploy them. That’s a daunting task that takes a lot of time and organisation on larger software projects.

This is what Jenkins is for. It’s an open-source tool that co-ordinates those stages into a pipeline. This makes it a useful tool for DevOps, a development and deployment approach that automates the various stages of building software, creating an efficient conveyor belt system. Teams that get DevOps right with tools like Jenkins can move from version roll-outs every few months to every few days (or even hours), confident that all their tests have been passed.

Jenkins used to be called Hudson, but its development team renamed it after Oracle forked the project and claimed the original name. It’s free,  and runs on operating systems including Windows, Mac, and Linux, and it can also run as a Docker image.

You can get Jenkins as a downloadable from the Jenkins.io website, but you’ll need to run the Java runtime environment to support it. Alternatively, you can install it as a Docker container by following the instructions on the official Jenkins site, which is what we’ll do here. Docker takes a little extra work to set up, but the advantage here is twofold: it solves some dependency problems you might run into with Java, and it also enables you to easily recreate your Jenkins install on any server by copying your Docker file and the Docker run command to run it, which we put into a shell script for increased convenience. Jenkins’ Docker instructions also install a souped-up user interface called Blue Ocean. If you don’t use the Jenkins Docker instructions you can also install Blue Ocean separately as a plugin.

First, we must create a Python program for Jenkins to work with. We created a simple file called test-myapp.py, stored on our Linux system in /home/$USER/python/myapp. It includes a basic test using the Python PyTest utility:

#test_capitalization

def capitalize_word(word):

    return word.capitalize()

 def test_capitalize_word():

    assert capitalize_word(‘python’) == ‘Python’

Create a Github repository for it using git init. Commit the file to your repo using git add ., and then git commit -m “first commit”.

Now it’s time to start Jenkins using the docker run command in the Jenkins teams’ Docker instructions. Once Jenkins is running, you can access it at localhost:8080. It will initially show you a screen with a directory path to a file containing your secure first-time access password. Copy the contents of the file to get logged into the administration screen, and from there it will set up the basic plugins you need to work with the software. Then, you can create a new user account for yourself.

 

If you’re not already in the Blue Ocean interface, click on that option in the left sidebar. It will ask you to create a project. Call it myapp and then select Git as the project type in the Blue Ocean interface.

Blue Ocean will now ask you to create a pipeline. Click yes. We’re going to write this pipeline ourselves in a ‘Jenkinsfile’, which we’ll store in our myapp folder.

A Jenkinsfile is a test file describing your pipeline. It contains instructions for the stages of each build. It looks like this:

pipeline {

    agent any

     stages {

        stage(‘Build’) {

            steps {

                <steps for this stage go here>

            }

        }

        stage(‘Test’) {

            steps {

                <steps for this stage go here>

            }

        }

        stage(‘Deploy’) {

            steps {

                <steps for this stage go here>

            }

        }

    }

}

Each stage reflects a step in the build pipeline and we can have as many as we like. Let’s flesh out this template.

Python programs don’t need building and deploying in the same way that, say, C++ programs do, because they’re interpreted. Nevertheless, Jenkins is useful in other ways. We can test our code automatically, and we can also check its formatting to ensure that it’s easy for other developers to read.

To do this, we need to compile the Python program into bytecode, which is an intermediate stage that happens when you run Python programs. We’ll call that our build stage. Here’s the Jenkinsfile for that step:

pipeline {

    agent none

    stages {

        stage(‘Build’) {

            agent {

                docker {

                    image ‘python:2-alpine’

                }

            }

            steps {

                sh ‘python -m py_compile test_myapp.py’

                stash(name: ‘compiled-results’, includes: ‘*.py*’)

            }

        }

    }

}

The agent is the external program that runs this stage. We don’t define a global one, but we do define one for the individual stage. In this case, because we’re using Docker, it’s a lightweight Alpine container with a Python implementation.

For our step, we run a shell command that compiles our python file.

Save this in your project folder as ‘Jenkinsfile’ and then commit it using git add . and git commit -m “add Jenkinsfile”.

Back in the UI, ignore Blue Ocean’s prompt to create a pipeline. Once it spots the Jenkinsfile in your repo, it’ll build it from that automatically. Go into the Jenkins dashboard by clicking the exit icon next to the Logout option on the top right, or clicking the Jenkins name on the top left of the screen, to get to the main Jenkins screen. Look for your new project in the dashboard and on the left, select Scan Multibranch Pipeline Now.

Wait for a few seconds and Jenkins will scan your Git repo and run the build. Go back into the Blue Ocean interface, and all being well you’ll see a sunny icon underneath the HEALTH entry that shows the build succeeded. Click on myapp, then Branch indexing, and it’ll give you a picture of your pipeline and a detailed log.

Now we will add a test stage. Update your code to look like this:

pipeline {

    agent none

    stages {

        stage(‘Build’) {

            agent {

                docker {

                    image ‘python:2-alpine’

                }

            }

            steps {

                sh ‘python -m py_compile test_myapp.py’

                stash(name: ‘compiled-results’, includes: ‘*.py*’)

            }

        }

        stage(‘Test’) {

            agent {

                docker {

                    image ‘qnib/pytest’

                }

            }

            steps {

                sh ‘py.test –verbose –junit-xml test-results/results.xml test_myapp.py’

            }

            post {

                always {

                    junit ‘test-results/results.xml’

                }

            }

    }

    }

}

We’re using another Docker container to run a simple PyTest (which we included in the code of our myapp.py file). Save this file and update your repo with another git add a . and git commit -m “add test stage to Jenkinsfile”. Then, scan the multibranch pipeline as before. When you drop into Blue Ocean, you’ll hopefully see success once again. Note that Docker stores everything it runs in its own volume, along with the results. Although you can work some command line magic to access those files directly, you don’t need to; Jenkins shows you those assets in its UI. Click on the latest stage to open the build details, and find the entry that says py.test –verbose –junit-xml test-results/results.xml testmyapp.py. Clicking on that shows you the results of your test:

Everything passed! Now we’re going to bring it home with the final stage in our demo pipeline: checking the code formatting. There are specific rules for formatting Python code as outlined in the language’s PEP-8 specifications. We’ll update our Jenkins file to use a tool called PyLint that will check our code. Here’s the full Jenkinsfile for all three stages of our pipeline:

pipeline {

    agent none

    stages {

        stage(‘Build’) {

            agent {

                docker {

                    image ‘python:2-alpine’

                }

            }

            steps {

                sh ‘python -m py_compile test_myapp.py’

                stash(name: ‘compiled-results’, includes: ‘*.py*’)

            }

        }

        stage(‘Test’) {

            agent {

                docker {

                    image ‘qnib/pytest’

                }

            }

            steps {

                sh ‘py.test –verbose –junit-xml test-results/results.xml test_myapp.py’

            }

            post {

                always {

                    junit ‘test-results/results.xml’

                }

            }

    }

        stage(‘Lint’) { 

            agent {

            docker {

                    image ‘eeacms/pylint’

        }

        }

            environment { 

                VOLUME = ‘$(pwd)/test_myapp.py’

                IMAGE = ‘eeacms/pylint’

            }

            steps {

            withEnv([‘PYLINTHOME=.’]) {

                    sh “pylint ${VOLUME}”

        }

        }

    }

    }

}

Follow the same steps as before: save the file, commit it to your Git repo so that Jenkins sees it, and then rescan the multi-branch pipeline. Then go into Blue Ocean and look at the result. Oh no!

The pipeline stage failed! That’s because our code is badly formatted, and PyLint tells us why. We’ll update our test_myapp.py file to make the code compliant:

“””

Program to capitalize input

“””

#test_capitalization

def capitalize_word(word):

    “”” Capitalize a word”””

    return word.capitalize()

 def test_capitalize_word():

    “””Test to ensure it capitalizes a word correctly”””

    assert capitalize_word(‘python’) == ‘Python’

Now, save, commit to your repo, and rescan. Blue Ocean shows that we fixed it (note that in our demo it took us a couple of runs at the Python code to get the formatting right).

You could run all these steps manually yourself, but the beauty of Jenkins is that it automates them all for faster development. That makes the tool invaluable for developers working on a fast cadence as part of a team, but even a single freelance dev, or a hobbyist working on open-source projects, can use this to refine their practice.

Amazon updates development environment powering Alexa


Danny Bradbury

2 Feb, 2021

Amazon has announced an update to Lex, its conversational artificial intelligence (AI) interface service for applications, in order to make it easier to build bots with support for multiple languages.

Lex is the cloud-based service that powers Amazon’s Alexa speech-based virtual assistant. The company also offers it as a service that allows people to build virtual agents, conversational IVR systems, self-service chatbots, or informational bots.

Organizations define conversational flows using a management console that then produces a bot they can attach to various applications, like Facebook Messenger.

The company released its Version 2 enhancements and a collection of updates to the application programming interface (API) used to access the service.

One of the biggest V2  enhancements is the additional language support. Developers can now add multiple languages to a single bot, managing them collectively throughout the development and deployment process. 

According to Martin Beeby, principal advocate for Amazon Web Services, developers can add new languages during development and switch between them to compare conversations.

The updated development tooling also simplifies version control to track different bot versions more easily. Previously, developers had to version a bot’s underlying components individually, but the new feature allows them to version at the bot level.

Lex also comes with new productivity features, including saving partially completed bots and uploading sample utterances in bulk. A new configuration process makes it easier for developers to understand where they are in their bot’s configuration, Beeby added.

Finally, Lex now features a streaming conversation API that can handle interruptions in the conversation flow. It can accommodate typical conversational speed bumps, such as a user pausing to think or asking to hold for a moment while looking up some information.

Cisco’s new SD-WAN routers bring 5G and virtualisation to enterprises


Danny Bradbury

27 Jan, 2021

Cisco has launched four new devices in its Catalyst 8000 Edge series. They include a 5G cellular gateway, two Virtual CPE Edge units, and a heavy-duty aggregation router, all designed to support SD-WAN networking.

The company debuted its Catalyst 8000 family last October, and Cisco expanded the range by launching the Cisco Catalyst 8500L, the latest addition to the Catalyst 8500 Edge family. This is a collection of what Cisco calls aggregated service routers designed with large enterprises and cloud service providers in mind. They’re an evolution of the company’s ASR 1000 series of routers. 

Cisco designed the Catalyst 8500L for SD-WAN use cases, where wide area network (WAN) connectivity is configurable via software. They also support the emerging secure access service edge (SASE) model that Gartner defined in mid-2019. That builds on SD-WAN to offer zero-trust networking capabilities for secure access.

The 8500L is an SD-WAN capable router that offers WAN connectivity at speeds of up to 10 Gbps. Rackable in a 1RU form factor, it runs either at core sites or colocation sites and features twelve x86 cores and up to 64 GB of memory, which Cisco says will support secure connectivity for thousands of remote sites. It also includes Cisco Trust Anchor technology, a tamper-proof trusted platform module that guarantees the hardware is authentic to protect against supply chain attacks.

The Cisco Catalyst 8200 is a 1 Gbps branch office router with eight CPU cores and 8 GB of RAM, which Cisco says doubles the performance over the existing ISR 4300 series. That is helped by a hardware performance acceleration feature called Intel QuickAssist.

Complementing the Catalyst 8200 is the Cisco Catalyst 8200 uCPE, a branch office customer premises equipment (CPE) device with a small physical footprint. It features eight CPU cores and supports up to 500 Mbps aggregate IPsec performance. It can also take pluggable interface modules (PIMs), which are hardware extensions giving it cellular connectivity options.

Speaking of cellular communications, the new Cisco Catalyst Cellular Gateway 5G plugs into a router to offer on-premises cellular connectivity in sub-6GHz bands. The device uses Power over Ethernet (PoE) to relay cellular communications back to the router.

Admins can manage the devices using software, including vManage, licensed under Cisco’s DNA model. vManage has three tiers: 

  • DNA Essentials offers core SD-WAN, routing, and security features. 
  • DNA Advantage gives subscribers the Essentials package, plus support for advanced routing capabilities including MPLS BGP; better security support with Advanced Malware Protection and SSL proxy features; and access to Cisco’s Cloud OnRamp, which helps with connectivity to multiple cloud service providers.
  • DNA Premier adds centralized security management for all locations.

How to automate your infrastructure with Ansible


Danny Bradbury

2 Dec, 2020

Hands up if you’ve ever encountered this problem: you set up an environment on a server somewhere, and along the way, you made countless web searches to solve a myriad of small problems. By the time you’re done, you’ve already forgotten most of the problems you encountered and what you did to solve them. In six months, you have to set it all up again on another server, repeating each painstaking step and relearning everything as you go.

Traditionally, sysadmins would write bash scripts to handle this stuff. Scripts are often brittle, requiring just the right environment to run in, and it takes extra code to ensure that they account for different edge cases without breaking. Scaling that up to dozens of servers is a daunting task, prone to error.

Ansible solves that problem. It’s an IT automation tool that lets you describe what you want your environment to look like using simple files. The tool then uses those files to go out and make the necessary changes. The files, known as playbooks, support programming steps such as loops and conditionals, giving you lots of control over what happens to your environment. You can reuse these playbooks over time, building up a library of different scenarios.

Ansible is a Red Hat product, and while there are paid versions with additional support and services bolted on, you can install this open-source project for free. It’s a Python-based program that runs on the box you want to administer your infrastructure from, which must be a Unix-like system (typically Linux). It can administer Linux and Windows machines (which we call hosts) without installing anything on them, making it simpler to use at scale. To accomplish this, it uses SSH certificates, or remote PowerShell execution on Windows.

We’re going to show you how to create a simple Linux, Apache, MySQL and PHP (LAMP) stack setup in Ansible.

To start with, you’ll need to install Ansible. That’s simple enough; on Ubuntu, put the PPA for Ansible in your sources file and then tell the OS to go and get it:

$ sudo apt update

$ sudo apt install software-properties-common

$ sudo apt-add-repository –yes –update ppa:ansible/ansible

$ sudo apt install ansible

To test it out, you’ll need a server that has Linux running on it, either locally or in the cloud. You must then create an SSH key for that server on your Ansible box and copy the public key up to the server.

Now we can get to the fun part. Ansible uses an inventory file called hosts to define many of your infrastructure parameters, including the hosts that you want to administer. Ansible reads information in key-value pairs, and the inventory file uses either the INI or YAML formats. We’ll use INI for our inventory.

Make a list of the hosts that you’re going to manage by putting them in the inventory file. Modify the default hosts file in your /etc/ansible/ folder, making a backup of the default one first. This is our basic inventory file:

# Ansible hosts

 [LAN]

db_server ansible_host=192.168.1.88

db_server ansible_become=yes

db_server ansible_become_user=root

The phrase in the square brackets is your label for a group of hosts that you want to control. You can put multiple hosts in a group, and a host can exist in multiple groups. We gave our host an alias of db_server. Replace the IP address here with the address of the host you want to control.

The next two lines enable Ansible to take control of this server for everything using sudo. ansible-become tells it to become a sudo user, while ansible-become-user tells it which sudoer account to use. Note that we haven’t listed a password here.

You can use Ansible to run shell commands that influence multiple hosts, but it’s better to use modules. These are native Ansible functions that replicate many Linux commands, such as copy (which replicates cp), user, and service to manage Linux services. Here, we’ll use Ansible’s apt module to install Apache on the host.

ansible db_server -m apt -a ‘name=apache2 state=present update_cache=true’ -u danny –ask-become-pass

The -m flag tells us we’re running a module (apt), while -a specifies the arguments. update_cache=true tells Ansible to update the packages cache (the equivalent of apt-get upgrade), which is good practice. -u specifies the user account we’re logging in as, while –ask-become-pass tells Ansible to ask us for the user password when elevating privileges.

state=present is the most interesting flag. It tells us how we want Ansible to leave things when it’s done. In this case, we want the installed package to be present. You could also use absent to ensure it isn’t there, or latest to install and then upgrade to the latest version.

Then, Ansible tells us the result (truncated here to avoid the reams of stdout text).

db_server | CHANGED => {

    “ansible_facts”: {

        “discovered_interpreter_python”: “/usr/bin/python3”

    },

    “cache_update_time”: 1606575195,

    “cache_updated”: true,

    “changed”: true,

    “stderr”: “”,

    “stderr_lines”: [],

Run it again, and you’ll see that changed = false. The script can handle itself whether the software is already installed or not. This ability to get the same result no matter how many times you run a script is known as idempotence, and it’s a key feature that makes Ansible less brittle than a bunch of bash scripts.

Running ad hoc commands like this is fine, but what if we want to string commands together and reuse them later? This is where playbooks come in. Let’s create a playbook for Apache using the YAML format. We create the following file and save it as /etc/ansible/lampstack.yml:

– hosts: lan

  gather_facts: yes

  tasks:

  – name: install apache

    apt: pkg=apache2 state=present update_cache=true

  – name: start apache

    service: name=apache2 state=started enabled=yes

    notify:

    – restart apache

  handlers:

    – name: restart apache

      service: name=apache2 state=restarted

hosts tells us which group we’re running this script on. gather_facts tells Ansible to interrogate the host for key facts. This is handy for more complex scripts that might take steps based on these facts.

Playbooks list individual tasks, which you can name as you wish. Here, we have two: one to install Apache, and one to start the Apache service after it’s installed.

notify calls another kind of task known as a handler. This is a task that doesn’t run automatically. Instead, it only runs when another task tells it to. A typical use for a handler is to run only when a change is made on a machine. In this case, we restart Apache if the system calls for it.

Run this using ansible-playbook lampstack.yml –ask-become-pass.

So, that’s a playbook. Let’s take this and expand it a little to install an entire LAMP stack. Update the file to look like this:

– hosts: lan

  gather_facts: yes

   tasks:

  – name: update apt cache

    apt: update_cache=yes cache_valid_time=3600

   – name: install all of the things

    apt: name={{item}} state=present

    with_items:

      – apache2

      – mysql-server

      – php

      – php-mysql

      – php-gd

      – php-ssh2

      – libapache2-mod-php

      – python3-pip

   – name: install python mysql library

    pip:

      name: pymysql

   – name: start apache

    service: name=apache2 state=started enabled=yes

    notify:

    – restart apache

   handlers:

    – name: restart apache

      service: name=apache2 state=restarted

Note that we’ve moved our apt cache update operation into its own task because we’re going to be installing several things and we don’t need to update the cache each time. Then, we use a loop. The {{item}} variable repeats the apt installation with all the package names indicated in the with_items group. Finally, we use Python’s pip command to install a Python connector that enables the language to interact with the MySQL database.

There are plenty of other things we can do with Ansible, including breaking out more complex Playbooks into sub-files known as roles. You can then reuse these roles to support different Ansible scripts.

When you’re writing Ansible scripts, you’ll probably run into plenty of errors and speed bumps that will send you searching for answers, especially if you’re not a master at it. The same is true of general sysadmin work and bash scripting, but if you use this research while writing an Ansible script, you’ll have a clear and repeatable recipe for future infrastructure deployments that you can handle at scale.

Getting started with Kubernetes


Danny Bradbury

19 Mar, 2020

Container systems like Docker are a popular way to build ‘cloud native’ applications designed for cloud environments from the beginning. You can have thousands of containers in a typical enterprise deployment, and they’re often even more ephemeral than virtual machines, appearing and disappearing in seconds. The problem with containers is that they’re difficult to manage at scale, load balancing and updating them in turn via the command line. It’s like trying to herd a bunch of sheep by dealing with each animal individually.

Enter Kubernetes. If containers are sheep, then Kubernetes is your sheepdog. You can use it to handle tasks across lots of containers and keep them in line. Google created Kubernetes in 2014 and then launched the Cloud Native Computing Foundation (CNCF) in partnership with the Linux Computing Foundation to offer it as an open project for the community. Kubernetes can work with different container systems, but the most common is Docker.

One problem that Kubernetes solves is IP address management. Docker manages its own IP addresses when creating containers, independently of the host virtual server’s IP in a cloud environment. Containers on different nodes may even have the same IP address as each other. This makes it difficult for containers on different nodes to communicate with each other, and because containers on the same host share the same host IP address space, they can’t use the same ports. Two computers on the same node can’t each expose a service over port 80, for example.

Understanding Kubernetes pods and clusters

Kubernetes solves problems like this by grouping containers into pods. Each container in a pod has the same IP address, and they can communicate with each other on localhost. It exposes these pods as services (an example might be a database or a web app). Collections of pods and the nodes they run on are known as clusters, and each container in a clustered pod can talk to containers in other pods using Kubernetes’ built-in name resolution.

You can have multiple pods running on a node (a physical or virtual server). Each node runs its own Kubelet, which ensures that a cluster is in the correct state, along with a kube-proxy, which handles network communication for the pods. Nodes work together to form a cluster.

Kubernetes manages all this using several components. The first is the network overlay, which handles networking between different pods. You can install a variety with a range of capabilities, including advanced ones like the Istio service mesh.

The second component is etcd, which is a database for all the objects in the cluster, storing their details as a series of key:value pairs. etcd runs on a master node, which is a machine used to administer all the worker nodes in the cluster. The master node contains an API server that acts as an interface for all components in the cluster.

A node controller running on the master node handles when nodes go down, while a service controller manages accounts and access tokens so that pods can authenticate and access each other’s services. A replication controller creates running copies of pods across different nodes that run the same services, sharing workloads and acting as backups.

Installing and running Kubernetes

Installing Kubernetes will be different on each machine. It runs not just on Linux, but also on Windows and macOS. In summary, you’ll install your container system (usually Docker) on your master and worker nodes. You’ll also install Kubernetes on each of these nodes, which means installing these tools: kubeadm for cluster setup, kubectl for cluster control, and kubelet, which registers each node with the Kubernetes API controller.

You’ll enable your kubelet service on each of these nodes so that it’s ready to talk to the API. Then initialise your cluster using the kubeadm command kubeadmin init for your master node. This will give you a custom kubeadm join command that you can copy and use to join each worker node to the cluster.

After this, you can create a pod. You define the pod’s characteristic using a configuration file known as a PodSpec. This is often written in YAML (“YAML ain’t markup language”), which is a human- and machine-readable configuration format. Your YAML file will define the name space that your pod exists in (you can name Kubernetes clusters differently so that you can run multiple clusters on the same physical machine). 

The PodSpec also defines the details for each container inside the pod, including the Docker images on which they’re based. This file can define a pod-based volume for them in the same pod so that they can store data on disk and share it. You can create a pod using a single command – kubectl create – passing it the name of your YAML file.

Running copies of a pod for resilience and workload sharing is known as replication, and a collection of replicated pods is called a replica set. While you can handle replica sets directly, you’ll often control them using another kind of Kubernetes object known as a deployment. These are objects in the Kubernetes cluster that you use to create and update replica sets, and clear them away when you’re done with them. Replica sets can contain many pods, and a deployment gives you a strategy to update them all (adding a new version, say).

A YAML-based deployment file also contains a PodSpec. After creating a deployment (and therefore its replica pods) using a simple kubectl create command, you can then update the whole deployment by changing the version of the container image it’s using. Do this using kubectl set image, passing it a new version number for the image. The deployment now updates all the pods with the new specification behind the scenes, taking care to keep a percentage of pods running at all times so that the service keeps working.

This is all great, but how do we actually talk to and reference these pods? If we have, say, ten web server pods in a replica set, we don’t want to work out which one’s IP to visit. That’s where Kubernetes’ services come in. We define a service that exposes that replica set using a single IP address and a service name like ‘marketing-server’. You can connect to the service’s IP address or the service name (using Kubernetes’ DNS service) and the service interacts with the pods behind the scenes to deliver what you need.

That’s a short introduction to Kubernetes. As you can imagine, there’s plenty more to learn. If you’re hoping to manage native cloud services in any significant way, you’re going to bump up against it frequently, so it pays to invest the time in grokking this innovative open source technology as much as you can. With Kubernetes now running on both Amazon Web Services and Azure alongside Google’s cloud service, it’s already behind many of the cloud services that people use today.