Google Cloud functions

In this blog we take a look at Google Cloud functions.

Google Cloud functions is a Serverless platform for building event driven logic that is small in execution time (don’t use it for running long running processing jobs),

Cloud functions are Google’s equivalent Serverless offering compared to AWS Lambda and Azure functions.

Key features

Cloud functions are billed as follows:

  • How many times they are invoked
  • Execution time
  • CPU and memory allocation

So it is really important you design your functions to be stateless, idempotent and short lived. Cloud functions are great for building reactive microservices that are triggered based on an event such as:

  • Pub/sub message
  • Storage bucket event
  • Http triggers

Supported Runtimes

Cloud functions currently supports the following:

  • NodeJS
  • Python
  • .NET
  • Go
  • Java
  • PHP
  • Ruby

For a full list of versions of runtimes please visit:

https://cloud.google.com/functions/docs/concepts/exec

Use cases

Serverless platforms are very useful for building loosely coupled software that are triggered by events. Some interesting use case can include:

  • Generation of thumbnails of images uploaded to Cloud storage
  • Generating PDF invoices as part of a workflow
  • Process messages from a Pub/Sub topic – off load asynchronous processing
  • Call APIs such as Vision API to classify images

and many more…

Example code

For this blog I wrote a cloud function using Python to watermark an image uploaded to a Cloud storage bucket. The following image shows the architecture of the Cloud function:

import os
import tempfile
from PyPDF2.pdf import PdfFileReader, PdfFileWriter
from google.cloud import storage

storage_client = storage.Client()

watermark_file_name = 'watermark.pdf'
   
def watermark_file(event, context):
    """Background Cloud Function to be triggered by Cloud Storage.
       This generic function logs relevant data when a file is changed.
    Args:
        event (dict):  The dictionary with data specific to this type of event.
                       The `data` field contains a description of the event in
                       the Cloud Storage `object` format described here:
                       https://cloud.google.com/storage/docs/json_api/v1/objects#resource
        context (google.cloud.functions.Context): Metadata of triggering event.
    Returns:
        None; the output is written to Stackdriver Logging
    """
    output_bucket_name = os.environ.get('WATERMARK_OUTPUT_BUCKET_NAME')
    print(f'Output bucket: {output_bucket_name}')
    
    print_function_meta_data(context, event)

    uploaded_file = format(event['name'])
    input_bucket_name = event["bucket"]
    
    if(not uploaded_file.endswith('.pdf')):
        print('Invalid file format uploaded..Function will not watermark')
        return
   
    print('Reading from Bucket: {}'.format(event['bucket']))
    print('Reading the file to watermark: {}'.format(event['name']))
   
    input_blob = storage_client.bucket(input_bucket_name).get_blob(uploaded_file)
    watermark_blob = storage_client.bucket(output_bucket_name).get_blob(watermark_file_name)

    if(watermark_blob == None):
        print('Failed to read: {} Function cannot watermark!!'.format(watermark_file_name))
        return

    watermark_pdf(input_blob, watermark_blob)

Cloud functions must have a declared entry point. In my example code this is called watermark_file.

Notice the function takes two parameters:

event and context.

The “event” parameter is populated by the cloud function runtime. It contains data related to the cloud function event that has been triggered. The context object contains metadata of the triggering event.

Structuring code

For Python, your folder structure should include a function called main.py. You must include requirements.txt so that the Python package manager can install the dependencies.

├── src
│   ├── function
│   │   ├── main.py
│   │   └── requirements.txt

Deploying code

Cloud functions can be deployed via any CI/CD platform. In this example I show how to deploy the function using gcloud CLI:

cd src/function
gcloud functions deploy watermark_file \
--runtime python39 --trigger-bucket=storagebucket 
--set-env-vars WATERMARK_OUTPUT_BUCKET_NAME=storageoutputbucket

Testing the Cloud function

To test this particular function you need to upload two images. First you must upload a pdf with a watermark to the target Cloud storage bucket where the final merged watermarked pdf is written to. Then upload the source pdf to watermark.

gsutil cp watermark.pdf gs://outputbucket
gsutil cp input.pdf  gs://inputstoragebucket

If all goes well the watermarked output pdf will be written to the outputbucket Cloud storage.

Code

You can find the code on my github page:

https://github.com/romeelk/watermark-cloud-function

Stateful functions

AWS Lambda has a concept of stateful functions called AWS Step functions. This is useful when you want to orchestrate a workflow where you need to manage state, checkpoints and restarts.

Unfortunately GCP functions does not provide this feature. However, if you are looking at a workflow orchestrator then check out Cloud Composer.

Final thoughts

From a development point of view I found GCP Cloud functions very easy to setup. However, this is a very simple demonstration. When designing Serverless functions it is important to always remember the following key points:

  • Cold starts. This will happen on the first request after a deployment.
  • Performance of your code, memory consumption
  • Avoiding background tasks!!
  • Instrument and performance test
  • Consider the use objects at a more global scope for reuse between function invocations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: