The State of DevOps

I first heard the term DevOps about 5 years ago. I was transitioning from a world where words such as Agile, scrum, iteration, product backlog were the common parlance of developer discourse. Endless debates about what scrum and Agile was and not confusing the two. At the same time is was trying to understand what this new kid on the block meant.

Fast forward to five years later, the term Agile seems to have been silently swept aside from the marketing hype cycle.

I think as a marketing fad the word DevOps has become confused to mean many things by different organisations. The last few years we hear cries of “Your organisations needs a DevOps team” or doing DevOps will guarantee digital transformation!

A shift in mindset first

The reality is DevOps is not really a methodology it is rather a mindset and cultural change for organisations. If executed correctly delivering it can provide huge rewards to organisations. The way I view DevOps is a set of practices and principles that promote some changes to how development and operations work:

  • Cultural values that encourage communication and collaboration
  • Eliminating silos
  • Adopting automation at all parts of the delivery cycle
  • Delivering change in an iterative and incremental way (no big bang releases)
  • Making CI/CD a cornerstone of how you deliver applications and Cloud solutions

Executive buy in is crucial

These essential characteristics can only be gained if the people, process and product engineering are in union with one another. Additionally, it is important to demonstrate in non business language to C-Suite what business benefits this will bring. If this can’t be done then organisations will have a difficult time convincing CIOs, CTOs and the IT leaders that this is something to invest in in terms of people, time and funding. Executive’s want to know how the business can improve and what are the tangible results of adopting this approach.

What the industry is doing

One of my previous bosses introduced me to the DORA state of DevOps report. This was back in 2018. It is a very useful report as it gives important analysis of the state of play how companies are adopting DevOps across many businesses.

Industry findings

Reading the latest DORA state of DevOps report for 2019 unveils some interesting findings. The top 3 industries who responded to the report using DevOps as a way to lead digital transformation initiative were:

  • Technology at 38%
  • Financial Services at 12%
  • Retail at 9%

Is the message understood by C-Level?

In terms of participants who responded from departments the report found the top three were:

  • Development of Engineering at 30%
  • DevOps or SRE at 26%
  • Manager at 16%
  • C-level Executive at 4 %

The 4% for C-level is an interesting finding. Could this be an indication that the ideas and language are not permeating into C-Suite discussions?

Company size

Another interesting fact is respondents from larger companies (10, 000+) only accounts for a quarter whilst two quarters of respondents were from companies with a size of 20-1,999 employees. Again, one of the greatest challenges is changing how organisations work. In traditional matrix enterprises, breaking down silos, flattening the structure can be very challenging!

Are teams getting better?

The report seems to allude that engineering teams are getting better at delivering against the key metrics of (

  • Deployment frequency (Elite=on-demand multiple deploys, Low=once to six months)
  • Lead time for change (Elite=Less than one day, Low=between one and six months)
  • Time to restore service (Elite=Less than one hour, Low=between one week and one month)
  • Change failure rate (Elite=0-15%, Low=46-50%)

Compared to 2018 there has been an increase to 20% classified as elite. This is tripling from 2018 figure of 7%. Low performers are down but slightly from 15% in 2018 to 12% in 2019.

Public Cloud leads

The report also indicates from the respondents that more and more companies are using multi-clouds. Public cloud leads application hosting at 50%. This is no surprise as automation, programmable infrastructure (IaC), CI/CD are first class citizens.

Continuous improvement

The best way to start improving is to take an incremental and continuous improvement approach. There is no point jumping in to a toolchain approach or going on a hiring spree. First understand the pain points, bottlenecks in your processes. Select a particular area and then use principles of automation, CI/CD, source control to improve it. One step at a time.


To summarise the term DevOps does not seem to be a fad. However, what is clear is the marketing fad and language does not always translate to C-Suite interest. If you want to have success, executive buy in is a must. To read the DORA report please use the following link.

Click to access state-of-devops-2019.pdf

Compliance via Code

Cloud Governance with Cloud Custodian – Part 1

Before talking about Cloud Custodian i would like to mention Azure policies.

Azure policy is the out of the box policy engine that Microsoft provide as part of your Azure subscription. It uses a declarative syntax using JSON to define policies (security, audits and others) governing and auditing your compute, network and storage resources.

Codifying traditional manual checks such as this is great. It unleashes the power of using automation via CI/CD pipelines to deploy and assign policies to secure customer environments in a repeatable and consistent manner.

Azure policy can be used to prevent resources from being provisioned if not compliant with a policy as a well as providing information on policy for governance and audit purposes as well. Please refer to the official documentation for further. details:

However, having an interest in other Clouds such as AWS prompted me to look for solutions in this area. For myself, building tools on abstractions allows reuse and a common interface to the multitude of Cloud providers that exist today.

Multi cloud rules engine

Cloud Custodian is a rules engine that can be used against different Cloud providers (AWS, Google, Azure) for security, compliance checks and can apply actions on cloud resources. I would term it as compliance as code. Please check for further information.

How it works

Cloud custodian is written in Python.

Conceptually Cloud custodian makes use of the following:

  • The resource type you are going to run a policy against (S3 bucket (AWS), Blob Storage (Azure))
  • A filter which applies to a specific resource in a Cloud provider
  • Actions that will be applied to produce a policy effect on those filtered resources

Trying it out on Azure

I have been working with Azure for a couple of years so this page will show examples in Azure. I will follow up with an AWS example in a future post.

Before jumping into a deep dive read the following to install it on your machine of choice.

I decided to install it on my Mac.

As it is a Python based tool it installs a Python virtual environment so you can sandbox the python dependencies.

Cloud custodian will authenticate with Azure AD so it can perform actions at the management plane.

To do this you need to create a Service principal and use the credentials in your bash session:

# create Service Principal
az ad sp create-for-rbac --name policy-sp

   "appId": "03529a99-57db-479f-b200-2c444917481d",
   "displayName": "policy-sp",
   "name": "http://policy-sp",
   "password": "****************************", 
   "tenant": "69a1b9aa-fa4a-4015-be11-ebfb7f149410"

Once your service principal has been created take the appId (use as CLIENT_ID), tenant and password (use as CLIENT_SECRET) from the previous cli command to set the shell variables:


Once complete you are ready to create your own policies!! Next use azure cli to set your default subscription

az account set --subscription 4accb027-568f-45c2-aad0-d6b609d1a5ac

Defining my first policy

So, using the example from Cloud custodian docs I decided to try out the policy to tag an existing VM. I have an existing VM called examdevbox in a dev Azure subscription.

    - name: add-vm-tag-policy
      description: |
        Adds a tag to a virtual machines
      resource: azure.vm
        - type: value
          key: name
          value: examdevbox
       - type: tag
         tag: Environment
         value: Devs

Distilling the above the key components are:

  • resource: azure.vm – This is the compute provider in Azure
  • filters: filter for on a compute vm with a value of: examdevbox
  • actions: Add the tag called Environment with the value Devs

For further information on filters:

Using the cloud custodian cli I can now apply the policy!

custodian run --output-dir=. addtag.yml
2020-01-07 11:43:47,304: Authenticated [Azure CLI | 8669867b-c6be-419c-8aff-e49945115767]
2020-01-07 11:43:48,006: custodian.policy:INFO policy:add-vm-tag-policy resource:azure.vm region: count:1 time:0.70
2020-01-07 11:43:48,009: Action 'tag' modified 'examdevbox' in resource group 'RG-MVP-EXAMDEVBOX'.
2020-01-07 11:43:48,014: custodian.policy:INFO policy:add-vm-tag-policy action:tag resources:1 execution_time:0.01

Once the policy engine runs it prints out the result. In the above example it found a VM called examdevbox in resource group and tagged it!

See screenshot:

alt text

Validating your policy

Its good to see the CLI has a validate argument. I introduced a typo into the policy (bad code):

    - name: add-vm-tag-policy
      description: |
        Adds a tag to a virtual machines
      resource: azure.vm
        - type: value
          key: name
          value: examdevbox
       - type: tag
         tag: Environment
         value: Devs
         bad code

Running the validate command (output truncated):

custodian validate addtag.yml
Traceback (most recent call last):
  File "/Users/romeelkhan/development/Policies/custodian/bin/custodian", line 11, in <module>
    load_entry_point('c7n==', 'console

This helps the failure feedback loop, because once in a CI/CD pipeline you can make sure your teams are not checking in policies whose syntax is incorrect!

Running reports

I followed the online docs to run the report but was having no luck. After a few tries i got it working:

custodian report --output-dir=. --format grid --field tags=tags addtag.yml

| name       | location   | resourceGroup     | properties.hardwareProfile.vmSize   | tags                    |
| examdevbox | westeurope | RG-MVP-EXAMDEVBOX | Standard_F8s_v2                     | {'Environment': 'Devs'} |

Make sure you run custodian run before as it uses the resources.json generated for the policy file!

Pipeline first steps

Build Status

So, we have demonstrated it from the CLI. But the reality is in most organizations’ teams develop software, products, update and manage cloud infrastructure. This means to make this process consistent and repeatable and driven from changes made via source control we need a CI pipeline. In this example, I will use an Azure pipeline using Azure DevOps. This can be applied to any other CI tool of choice such as Jenkins or Git Lab ..

To accomplish this I created a public project in my Azure DevOps organization. I then integrated Azure pipelines with Github as the source control. This allowed me to create the following:

- master

  - job: 'Validate'
      vmImage: 'Ubuntu-16.04'
      - checkout: self
      - task: UsePythonVersion@0
        displayName: "Set Python Version"
          versionSpec: '3.7'
          architecture: 'x64'
      - script: pip install --upgrade pip
        displayName: Upgrade pip
      - script: pip install c7n c7n_azure
        displayName: Install custodian
      - script: custodian validate stopped-vm.yml
        displayName: Validate policy file

This simple pipeline is the equivalent of compiling code in a language such as Java, C# and then getting feedback if the code checkin broke the build.

The key part is the script step ‘Validate policy’ file which is like a linter step to check the policy file.

Saving and running the pipeline produces:

alt text

Other scenarios

The above demonstration was a small example of a policy effect. However, there are other scenarios organistaions may consider increasing their security or governance of their cloud environment. These include for example cases such as:

  • Preventing Public IP being provisioned
  • Ensuring storage accounts are secured with HTTPS
  • Making sure to block resources being deployed in non-compliant regions (AWS and Azure)
  • Ensuring only approved Azure images or Amazon AMI images are used

Just like Azure policies Cloud custodian allows security to be baked into the development lifecycle through concepts such as shift left and fail fast. By leveraging this tool via code and CI/CD practises security engineers/consultants, development teams and DevOps engineers can ensure security and governance are not an afterthought.

Comparing it with Azure Policies

There are some caveats you need to be aware of when you use Cloud custodian. Cloud custodian cannot prevent deployment of resources. Azure policies can. Also Cloud custodian takes a mutating approach to the resources. It won’t block deployments but make the corrective action for you. For example missing tags will be added as compared to Azure policy blocking the deployment.

Also Cloud custodian needs some time and effort to learn the filters and behaviours to use.

The best approach is to use it as an additional compliance layer for the areas you deem Azure policy does not cover.

Azure Blueprints

I could not forget to talk about Azure blueprints. Azure Blueprints (still in preview) is a interesting idea. The key difference with Azure policy is it packages IaC, RBAC and policy into a lifecycle of an environment that is pre-opinionated to meet a compliance standard. What this means you can deploy blueprints that meet key industry and compliance standards like:

  • ISO 27001

Next steps

In my next article i show how you can deploy cloud custodian to AWS. The code for this is in my original repo I created way back in January this year. And this article is based on the original instructions.

The Cloud native landscape

Over the last couple of years the Cloud native bandwagon has been gaining traction. Public cloud providers are becoming a key cornerstone in organisations strategy to disrupt, innovate and surface products and services to end consumers in ways they could never think of during the days of static infrastructure in traditional data centres.

Cloud native trend

The above Google trend indicates a growing interest towards the concept Cloud native. However, what does it mean to really be Cloud native?

The age of disruption

The IT industry is constantly evolving. First there was the PC revolution of the 70s and 80s. Then there was the internet revolution of the 2000’s and more recently the mobile phone revolution. One constant theme in all of this is those organisations that were quick to update the way they worked with technology and delivered software and services were always a step ahead of the crowd.

This is no different to the era of Cloud computing. However, one of the biggest leaps organisations must think about is how they consume and utilise the power of these services. We are currently in the midst of of an age of digital disruption where unicorns/startups are pushing enterprise companies of their pedestals. And one common theme amongst these startups is their Cloud native approach!

Cloud native thinking

The CNCF have a nice broad definition of the term “Cloud Native”

Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

When we look to organisations such as Netflix, Uber or even more recent startups such as Air bnb there are some key properties of Cloud native thinking that they were quick to exploit. These include:

  • The use of Dynamic infrastructure and viewing infrastructure as immutable (Pets vs Cattle)
  • Breaking down the traditional monolith into microservices to independently deploy and scale services (Scale cube)
  • Applying principles from 12 factor app design (https://12factorapp)
  • All of the above underpinned by organisations who adopt DevOps practices such as CI/CD, automation, IaC, automated testing, observability

Therefore before jumping on the bandwagon of Cloud native organisations should ask themselves are their IT teams setup to benefit from operating in this manner?

The key to success is both a shift in thinking in both how the organisation approach Cloud native projects and how IT teams structure themselves in a way that enables adoption of these modern practices.

The shift to Cloud Native infrastructure

A key foundation for any Cloud Native application is the infrastructure it runs on. But what does the term Cloud Native infrastructure mean?

Infrastructure as Code has enabled teams to declaratively define the infrastructure of their environments using tools such as Terraform, Puppet and Chef. But the reality is these tools are run by an operator to provision either through a CLI or as part of an infrastructure pipeline using CI/CD tool such as Jenkins, Azure DevOps or even Github actions.

Cloud native infrastructure extends the notion of IaC by doing the following.

  • Constantly monitor the state of the infrastructure (Desired state)
  • Query a Cloud control plane to work out the current state
  • Reconcile the differences to ensure the desired state (Reconciler pattern) by dynamically mutating the infrastructure

We find this pattern in infrastructure management software such as Terraform. Kubernetes additionally maintains this pattern when it maintains the desired state of a cluster.

The road ahead

In my Blog i will be exploring the Cloud native world from many angles. From addressing topics on Kubernetes, serverless and how organisations should modernise legacy applications to Cloud native apps. In addition I will be also talking about other Cloud topics such as DevOps/SecOps, identity and security, as well as important topics in enterprises like governance and security.

Join me on my journey into a Cloud native world!