Compliance via Code

Cloud Governance with Cloud Custodian – Part 1

Before talking about Cloud Custodian i would like to mention Azure policies.

Azure policy is the out of the box policy engine that Microsoft provide as part of your Azure subscription. It uses a declarative syntax using JSON to define policies (security, audits and others) governing and auditing your compute, network and storage resources.

Codifying traditional manual checks such as this is great. It unleashes the power of using automation via CI/CD pipelines to deploy and assign policies to secure customer environments in a repeatable and consistent manner.

Azure policy can be used to prevent resources from being provisioned if not compliant with a policy as a well as providing information on policy for governance and audit purposes as well. Please refer to the official documentation for further. details:

However, having an interest in other Clouds such as AWS prompted me to look for solutions in this area. For myself, building tools on abstractions allows reuse and a common interface to the multitude of Cloud providers that exist today.

Multi cloud rules engine

Cloud Custodian is a rules engine that can be used against different Cloud providers (AWS, Google, Azure) for security, compliance checks and can apply actions on cloud resources. I would term it as compliance as code. Please check for further information.

How it works

Cloud custodian is written in Python.

Conceptually Cloud custodian makes use of the following:

  • The resource type you are going to run a policy against (S3 bucket (AWS), Blob Storage (Azure))
  • A filter which applies to a specific resource in a Cloud provider
  • Actions that will be applied to produce a policy effect on those filtered resources

Trying it out on Azure

I have been working with Azure for a couple of years so this page will show examples in Azure. I will follow up with an AWS example in a future post.

Before jumping into a deep dive read the following to install it on your machine of choice.

I decided to install it on my Mac.

As it is a Python based tool it installs a Python virtual environment so you can sandbox the python dependencies.

Cloud custodian will authenticate with Azure AD so it can perform actions at the management plane.

To do this you need to create a Service principal and use the credentials in your bash session:

# create Service Principal
az ad sp create-for-rbac --name policy-sp

   "appId": "03529a99-57db-479f-b200-2c444917481d",
   "displayName": "policy-sp",
   "name": "http://policy-sp",
   "password": "****************************", 
   "tenant": "69a1b9aa-fa4a-4015-be11-ebfb7f149410"

Once your service principal has been created take the appId (use as CLIENT_ID), tenant and password (use as CLIENT_SECRET) from the previous cli command to set the shell variables:


Once complete you are ready to create your own policies!! Next use azure cli to set your default subscription

az account set --subscription 4accb027-568f-45c2-aad0-d6b609d1a5ac

Defining my first policy

So, using the example from Cloud custodian docs I decided to try out the policy to tag an existing VM. I have an existing VM called examdevbox in a dev Azure subscription.

    - name: add-vm-tag-policy
      description: |
        Adds a tag to a virtual machines
      resource: azure.vm
        - type: value
          key: name
          value: examdevbox
       - type: tag
         tag: Environment
         value: Devs

Distilling the above the key components are:

  • resource: azure.vm – This is the compute provider in Azure
  • filters: filter for on a compute vm with a value of: examdevbox
  • actions: Add the tag called Environment with the value Devs

For further information on filters:

Using the cloud custodian cli I can now apply the policy!

custodian run --output-dir=. addtag.yml
2020-01-07 11:43:47,304: Authenticated [Azure CLI | 8669867b-c6be-419c-8aff-e49945115767]
2020-01-07 11:43:48,006: custodian.policy:INFO policy:add-vm-tag-policy resource:azure.vm region: count:1 time:0.70
2020-01-07 11:43:48,009: Action 'tag' modified 'examdevbox' in resource group 'RG-MVP-EXAMDEVBOX'.
2020-01-07 11:43:48,014: custodian.policy:INFO policy:add-vm-tag-policy action:tag resources:1 execution_time:0.01

Once the policy engine runs it prints out the result. In the above example it found a VM called examdevbox in resource group and tagged it!

See screenshot:

alt text

Validating your policy

Its good to see the CLI has a validate argument. I introduced a typo into the policy (bad code):

    - name: add-vm-tag-policy
      description: |
        Adds a tag to a virtual machines
      resource: azure.vm
        - type: value
          key: name
          value: examdevbox
       - type: tag
         tag: Environment
         value: Devs
         bad code

Running the validate command (output truncated):

custodian validate addtag.yml
Traceback (most recent call last):
  File "/Users/romeelkhan/development/Policies/custodian/bin/custodian", line 11, in <module>
    load_entry_point('c7n==', 'console

This helps the failure feedback loop, because once in a CI/CD pipeline you can make sure your teams are not checking in policies whose syntax is incorrect!

Running reports

I followed the online docs to run the report but was having no luck. After a few tries i got it working:

custodian report --output-dir=. --format grid --field tags=tags addtag.yml

| name       | location   | resourceGroup     | properties.hardwareProfile.vmSize   | tags                    |
| examdevbox | westeurope | RG-MVP-EXAMDEVBOX | Standard_F8s_v2                     | {'Environment': 'Devs'} |

Make sure you run custodian run before as it uses the resources.json generated for the policy file!

Pipeline first steps

Build Status

So, we have demonstrated it from the CLI. But the reality is in most organizations’ teams develop software, products, update and manage cloud infrastructure. This means to make this process consistent and repeatable and driven from changes made via source control we need a CI pipeline. In this example, I will use an Azure pipeline using Azure DevOps. This can be applied to any other CI tool of choice such as Jenkins or Git Lab ..

To accomplish this I created a public project in my Azure DevOps organization. I then integrated Azure pipelines with Github as the source control. This allowed me to create the following:

- master

  - job: 'Validate'
      vmImage: 'Ubuntu-16.04'
      - checkout: self
      - task: UsePythonVersion@0
        displayName: "Set Python Version"
          versionSpec: '3.7'
          architecture: 'x64'
      - script: pip install --upgrade pip
        displayName: Upgrade pip
      - script: pip install c7n c7n_azure
        displayName: Install custodian
      - script: custodian validate stopped-vm.yml
        displayName: Validate policy file

This simple pipeline is the equivalent of compiling code in a language such as Java, C# and then getting feedback if the code checkin broke the build.

The key part is the script step ‘Validate policy’ file which is like a linter step to check the policy file.

Saving and running the pipeline produces:

alt text

Other scenarios

The above demonstration was a small example of a policy effect. However, there are other scenarios organistaions may consider increasing their security or governance of their cloud environment. These include for example cases such as:

  • Preventing Public IP being provisioned
  • Ensuring storage accounts are secured with HTTPS
  • Making sure to block resources being deployed in non-compliant regions (AWS and Azure)
  • Ensuring only approved Azure images or Amazon AMI images are used

Just like Azure policies Cloud custodian allows security to be baked into the development lifecycle through concepts such as shift left and fail fast. By leveraging this tool via code and CI/CD practises security engineers/consultants, development teams and DevOps engineers can ensure security and governance are not an afterthought.

Comparing it with Azure Policies

There are some caveats you need to be aware of when you use Cloud custodian. Cloud custodian cannot prevent deployment of resources. Azure policies can. Also Cloud custodian takes a mutating approach to the resources. It won’t block deployments but make the corrective action for you. For example missing tags will be added as compared to Azure policy blocking the deployment.

Also Cloud custodian needs some time and effort to learn the filters and behaviours to use.

The best approach is to use it as an additional compliance layer for the areas you deem Azure policy does not cover.

Azure Blueprints

I could not forget to talk about Azure blueprints. Azure Blueprints (still in preview) is a interesting idea. The key difference with Azure policy is it packages IaC, RBAC and policy into a lifecycle of an environment that is pre-opinionated to meet a compliance standard. What this means you can deploy blueprints that meet key industry and compliance standards like:

  • ISO 27001

Next steps

In my next article i show how you can deploy cloud custodian to AWS. The code for this is in my original repo I created way back in January this year. And this article is based on the original instructions.

