devarena logo
Reading Time: 34 minutes

Arguably, deploying ML applications has fundamentally been an infrastructure problem that has mostly slowed down how many ML models make it to production. The thought of configuring your infrastructure, long-term maintenance of your application in production, scalability, and saving costs, are all challenges that prevent many ML models from making it into production. 

As ML Engineers, our priority should be to focus on our ML applications and not worry about how to put out fires. Turns out, there’s a technology that does just that for us. Enter… Serverless! Serverless lets you focus on building your apps without having to worry about the underlying infrastructure, thus increasing the speed at which you can build and deploy models to production.

In this guide, you will learn about the core concepts of serverless relevant for you as an ML engineer and you will also go on to deploy an image classification application on a serverless platform so you can see how easy and fast it is.

What is serverless?

Before we go on to define what serverless means, let’s understand how we got here:

Serverless definition | Source: modified and adapted from Youtube

We started with building and deploying applications on our hosted servers and data centers with responsibilities for configuring and managing these servers. With the advent of the Cloud and other remote computing platforms came virtual machines where you could rent and use servers on-demand. When containers became mainstream, so did container orchestration platforms for application infrastructure like Kubernetes. We are now moving towards the era of serverless–where we trade-off the customization of our infrastructure for speed in development and deployment.

So what is serverless? Serverless is a fully managed approach to building, deploying, and managing your technology infrastructure. 

To fully grasp that definition, let’s understand what serverless offers you when you deploy your machine learning applications. With serverless, you are provided with:

  • Compute for your application without the need for owning, maintaining, or even provisioning underlying servers. 
  • Sophisticated networking and enterprise-level backend integration tools and services that automatically scale to your application workloads.
  • Persistence for durable storage without the need to manage your backups.
  • Managed versions of databases that can grow to meet the needs of your application.

How does serverless work?

Serverless platforms allow you to build and run your applications and services without thinking about servers. This is a new paradigm and there are two aspects to it: 

  • Bring your code for execution 
  • Integrate (or “glue”) your application code with other managed backend services.

For code execution, most serverless platforms will have a Function as a Service (FaaS) offering that will provide the function for you to execute your code on the serverless platform. This way, you can develop, run, and manage your application without worrying about the underlying infrastructure. You are working at the level of a function, not at the level of a container, or a server, or a virtual machine. 

Running your ML apps on serverless will also require that you integrate with other Backend-as-a-Service offerings such as databases, storage, messaging, and API management services. 

The paradigm change with serverless include:

  • No need to manage infrastructure: You are only responsible for your application. The infrastructure is fully managed by the service provider.
  • Event-driven applications: Your architecture depends on events and stateless computing.
  • Pay for what you use: When nothing is happening, you pay nothing. When something does happen, you have fine-grained billing visibility.

Let’s set some things straight before you start planning on building and deploying your ML application on any serverless computing platform.

What do you do in “serverless”?

When building serverless applications, you are responsible for:

  • Writing code that will instruct your application on how to run (services to interact with, authentication management, where to dump outputs, and so forth). In most cases, you should keep your code minimal to avoid vulnerabilities in your application. 
  • Defining triggers that function immediately there is an event (an occurrence of an activity). For example, if the storage service receives a file then it should spin up the serverless service which will, in turn, grab the file, run the program on it, and dump the output in another storage bucket.

What do you NOT do in “serverless”?

When building serverless applications, you are not responsible for:

  • Pre-provisioning the infrastructure for your application to run as that hassle should be handled by the service provider.
  • Patching and managing infrastructure is common with container-based infrastructure management. The management is taken care of by the service provider and you are only required to maintain your code.
  • Defining scaling policies, as the service is expected to auto-scale depending on the usage of your application. You can also decide to set some compute settings that enable scaling but in most cases, it’s handled by the service provider.

Another important point to note when building or deploying any application on serverless environments is that you do not store state alongside the compute because compute in serverless is short-lived – your servers are spun up during an event. They are not always running, waiting for requests. This means its state is stored elsewhere, often in another managed service. This is also one of the reasons why serverless apps can scale easily – the compute is stateless allowing more servers to be added under the hood for horizontal scalability.

To summarise this section, with serverless, you are getting an environment where you only need to deploy your code, connect your application to other backend managed services, and everything else is taken care of by the provider.

When should you consider deploying your ML models on serverless platforms?

Depending on the problem you are trying to solve, most ML applications are dynamic and would require continual updates and improvements as you learn what your model can do well and what it cannot. This fits the use case for serverless well because such platforms:

  • Can handle a high volume of activities. For example, your model can process prediction requests from a client and auto-scale based on the number of requests so turnaround time can remain low regardless of the number of requests being processed.
  • Are suited for dynamic workloads. You can quickly deploy a new model to the platform without configuring infrastructure or worrying about scaling. It’s like taking out an old model and replacing it with a new one.
  • Are provided on-demand compared to a virtual machine or container service that’s always on even without activity. This can save you money in the long run as it is likely that your ML service may not be as in-demand as your other services.
  • Are great for scheduled tasks. For your production use cases, you are probably going to be running scheduled tasks such as model retraining at certain intervals. With serverless workflows, you can automate the retraining and data processing steps to run on a time-based schedule.

Challenges of deploying your ML application on serverless

There are some challenges you might encounter when trying to deploy your ML application to a serverless environment.

Functions are ephemeral

Challenge: deploying to a serverless environment means your application will experience cold-starts, especially when the underlying server(s) is not already hosting a function when the event is triggered. This can cause latency issues when your application wants to process a new prediction request, especially for online models. This could also be an issue if you want to run your models offline too as your offline scoring must conclude in the window of time the instance is still running.

Potential solution: you might want to ensure that FaaS meets your required SLA (service level agreement). Although it is worth noting that your model can always return predictions in real-time with lower latency than the start-up latency after the cold-start of the server. Before choosing serverless, try to understand what you are optimizing for. Are you optimizing for real-time scoring? If yes, try to understand what an acceptable latency is.

Dependency on remote storage

Challenge: since compute is stateless, no sessions from previous client requests are retained and hence, there is a need for remote storage of sessions in a blob storage, cache, database, or an event blocker. This could cause performance issues as the function will need to write to slow storage and read back in every instance. If you are running high-performance machine learning models, this could be an issue to take note of.

Potential solution: any state used by your function needs to be stored externally, and preferably to storage with reasonably high read-write performance.

I/O performance 

Challenge: functions are densely packed and share network I/O so processes that involve high throughput computation might not be ideal for it.

Potential solution: if you are running high-performance ML applications, you may want to consider using another computing platform like a virtual machine or a container service.

Generic hardware

Challenge: with most serverless platforms, what you have available is basic CPU and computing hardware powering the underlying servers. You likely will not find providers offering GPUs or TPUs through their serverless platforms. Although this is starting to change as AWS recently announced that Lambda functions are now powered by their Graviton2 processors. Regardless, there are still limits that exist for memory size and processor types. For example, at the time of writing this, you can allocate up to 10 GB of memory to a Lambda function and have access to 6 CPUs for computation.

Potential solution: if your ML model will require a hardware accelerator like a GPU or TPU to run, you might want to consider using a virtual machine instead.

Vendor lock-in

Challenge: while you could port your code, you can’t always port your architecture. With serverless, migration becomes a challenge especially when your application integrates with other service offerings from the vendor. In most cases, you might find that the only way to migrate is to re-architecture your production system. Also, serverless is an emerging space and there’s a risk with picking winners too early.

Potential solution: before choosing a vendor, ensure you consider the services they offer, the SLAs for their services, and if they are the primary providers for your organization, and if it’s also easy to move your workload.

Security risks

Challenge: while serverless platforms are fully-managed, there are still security risks associated with deploying your applications as functions. One of those is insecure configuration. There could be a problem with leaving the API interaction between your services and external connections vulnerable, which can result in critical security threats such as exposure to data leaks, Distributed Denial of Service (DDoS) attacks, and so on.

In other cases, giving services and users more access than they need can pose a threat too. While it might be okay to do this while you learn, ensure you clean your architecture and only give the least amount of privilege a service or user needs to complete a functional application. Another reason to be sensitive with risks around security and malicious attacks is that they could also lead to high bills.

Potential solution: ensure you follow best practices of cloud security, an industry standard from AWS can be found in this documentation. Make sure to secure your credentials in a credential management service offered natively (recommended) by your serverless platform provider. Always follow the principle of least privilege for both your services and users.

Lesser monitoring and observability

Challenge: observability of the app, the model, and how they contribute to the business goals are crucial. This is hard to do with serverless applications as you are provided with logging and tracing tools that may not be sufficient for you to have a complete oversight of your application.

Potential solution: you might want to integrate with third-party ML monitoring tools for more effective oversight of your application in production.

Other risks

Challenge: here are also risks that are native to ML applications such as adversarial attacks, especially with computer vision applications. While this is not peculiar to serverless ML apps, protecting models from adversarial attacks is quite difficult to do with serverless as you do not control the underlying infrastructure your model runs on. For example, attacks could come from ingesting malicious inputs as requests to your model.

Potential solution: you may want to try building a more robust model with the best academic or industry approaches available out there as your security infrastructure should only be a secondary layer in protecting your model.

Serverless platform providers

There are several serverless platform providers in the market today. In this guide, you’ll learn how to deploy your application using Amazon Web Services serverless platform and services. You will also get a brief overview of other popular public cloud solutions such as Google Cloud Platform serverless FaaS offering and Microsoft Azure. Currently, these are the 3 major players in the market.

1. Google Cloud Functions

Google Cloud’s hosted FaaS offering is called Cloud Functions. The supported languages at the time of writing this include Node.js, Python, PHP, .NET, Java, Ruby, and Go. The available triggers for Cloud Functions are HTTP requests, messaging through Pub/Sub, Storage, Firestore, Firebase, schedulers, and modern data services.

Some of the features are:

  • Auto-scaling functions with “max” option where you can pick a max option if you don’t want to have too many instances running that automatically scale to your workloads. 
  • You can run each of your functions under unique identities, helping you to have some more fine-grained control.
  • Cloud Run is available to let you run any stateless request-driven container on serverless platforms in Google Cloud. 

2. Microsoft Azure Serverless

Azure Functions is Azure’s function-as-a-service platform for serverless applications. It supports several languages including C#, F#, Java, and Node. The available triggers for Azure Functions include Blob Storage, Cosmos DB, Event Grid, Event Hubs HTTP requests, Service Bus (messaging system), Timer (event time-based system).

Some of the features are:

3. AWS Lambda

AWS Lambda is a hosted function-as-a-service (FaaS) platform from Amazon Web Services. You can write your functions with a variety of languages such as Node, Java, C#, and Python. The available triggers for Lambda include Kinesis, DynamoDB, SQS, S3, CloudWatch, Codecommit, Cognito, Lex, and API Gateway.

In this guide, you will learn how to deploy an image classification model to a serverless environment on AWS to build a serverless ML application.

Deploying an image classification model on serverless [DEMO]

Understanding the model to be deployed

We will use a pre-trained model which classifies 3 categories of moth species that mostly plague African maize crops: African armyworm (AAW), Fall armyworm (FAW), and Egyptian cotton leafworm (ECLW).


Here, we want to build a mobile application that can query an ML model to return results based on new images of moth species uploaded to external storage (could be from other external applications like a camera on a farm). The ML model needs to be continually improved, as this is just the first version to collect more moth image data. The ML application on the backend should be able to handle unstructured data (image) that can come in both batch and stream, and occasionally in bursts as well while keeping costs as low as possible.

As you may have guessed, this sounds like a job for serverless!

Model details

The model was trained using Tensorflow 2.4.0 and it uses ResNet-50 model weights with some custom layers added as final layers for fine-tuning. The model expects an input image of 224×224 size and the output layer returns only 3 classes based on the problem we are solving. 

The model has been compressed to a tar.gz archive file and uploaded to a folder within an S3 bucket. To ensure the file is readable, the object was given public read access

If you are following along with your model, you can learn how to give public read access here. Also, ensure your model gets saved properly. This model was saved using the `` format and compressed to a `.tar` file (preferably).


Few important things you need to have before getting started –

  • A free tier AWS account or an active account with up to $3 in it. Following along with this guide should cost less than $3 as of the time of publication.
  • Basic experience using AWS cloud.
  • A fully configured command-line interface tool to work with your IDE. You can install the full AWS toolkit for VSCode here.
  • A little familiarity with Docker CLI.
  • An active IAM user (for best security practices). I’d advise you not to use the root user for this technical guide. You can provide access to the services listed below for the user:

Here’s an example of the policies attached to the user I created for this role (excluding the full access to DynamoDB):

List of attached policies to the user (DEMO).
List of attached policies to the user | Source: Author

The second Elastic Container Registry (ECR) policy is the public policy. If you want to be able to share your container software with anyone in the world, you may want to include this policy. Also, to make this guide easy to follow, I have allowed all resources and full access for this user – if you are applying this to your project, you may only want to give the least privilege necessary to complete the project to avoid security issues.

Overview of the solution

For this technical guide, the solution we will build will use two workflows: Continuous Integration/Delivery (CI/CD) workflow and the main deployment workflow. Here is our entire workflow:

  • We will use a CI/CD pipeline to build and package our application on the Cloud.
  • We will then deploy our packaged application to a serverless function that integrates with other services to provide a fully managed backend production system.

Continuous Integration/Continuous Delivery workflow

Here’s how our CI/CD architecture will look like:

Continuous Integration/Continuous Delivery Workflow
Continuous integration/ delivery workflow | Source: Author

Here are the steps we will take in this workflow:

  1. IAM user will use AWS CLI and git client to push our application code and some configuration files to a private CodeCommit repository. 
  2. This new commit will trigger CodePipeline to create a new build for the Docker image from our Dockerfile and a specified build configuration using CodeBuild
  3. Once the build is done and has succeeded, the Docker image will be pushed to a private registry in AWS ECR
  4. You can now use the Docker image to create the Lambda function that’ll serve as an inference endpoint in production.

While it’s important to run your model testing and validation in this pipeline, this guide does not cover the testing procedures. 

Deployment workflow

Here’s what our deployment architecture will look like:

Serverless backend deployment workflow
Serverless backend deployment workflow | Source: Author

Here are the steps in this workflow:

  1. Client uploads an image to an S3 bucket through the REST API endpoint or management console.
  2. Eventbridge triggers the Lambda Function hosting our model.
  3. The Function streams logs to Amazon Cloudwatch logs.
  4. The Function also writes the results to DynamoDB.

You will create a Lambda function inference endpoint with the Docker image you built from your ECR repository. At this point, you will test the function before deploying it. Once it’s deployed, you can now perform an end-to-end test by uploading a new image to the S3 bucket you will create to ensure everything works as expected.

This is the overview of the solution you are going to build in this guide. Enough talk, let’s solve some problems!

Building the solution

To implement the solution, clone the repository for this guide. Before we review the application code and configuration files, let’s set up some of the AWS services we will be using for this solution. Ensure your user has the necessary permission for these services and is under the right region.

Create an S3 bucket to hold inference images

  1. To use the management console, go to To use the CLI, check out the documentation and follow along.
  2. Click on Create Bucket on the right side of your screen.
  3. Enter a descriptive name for your bucket and select the appropriate region:
Create the S3 bucket to hold inference images
Create the S3 bucket to hold inference images | Source: Author

4. Under Block Public Access settings for this bucket, uncheck Block all public access and check the box to agree to the terms below:

Block Public Access settings for the bucket
Block public access settings for the bucket | Source: Author

5. Next, enable Bucket Versioning so new images with the same name are not overwritten (especially if your app developer does not cater for that):

 Enable Bucket Versioning
Enable bucket versioning | Source: Author

5. Leave other settings on default and click on Create bucket:

Create bucket
Create bucket | Source: Author

For this guide, we will give public read access in case you want your application to return the image that was predicted by your model. Since the data isn’t sensitive, we can get away with it but you may want to add authenticated users or services only to view the data (for security reasons). For write access, it depends on what device or user you prefer to upload images for inference. If you have specific devices, then grant access to only them.

6. Click on the Permissions tab:

Select permissions tab
Go to permissions tab | Source: Author

7. Under the permissions tab, go to Bucket policy and click Edit:

Go to bucket policy to edit
Go to bucket policy to edit | Source: Author

8. Enter the following policy and change `<REPLACE WITH THE NAME OF YOUR BUCKET>` to the name of your bucket:

    "Version": "2012-10-17",
    "Statement": [
            "Sid": "PublicRead",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
            "Resource": "arn:aws:s3:::<REPLACE WITH THE NAME OF YOUR BUCKET>/*"

Your Policy Editor should now look like this:

Policy editor
Policy editor | Source: Author

9. Scroll down and click on Save changes:

Save the changes
Save the changes | Source: Author

That’s it! Objects in your bucket should now be publicly accessible (to read). This will be crucial for us to return the image URL.

Create a DynamoDB table to store predictions

Earlier in this guide, we learned that serverless functions are stateless. We need to store our predictions somewhere so it’s easier to communicate with our application. We will use Amazon’s NoSQL serverless database DynamoDB for this as it scales pretty well and integrates with the serverless function quite easily.

As is the case with any managed service you are choosing, you have to make sure it fits your application requirements. In our case, we will need to store the following fields:

PredictionDateThe date the model made the predictions.
ClassNameThe name of the moth the model predicted.
ClassPredictionIDThe unique ID of the prediction made by the model will help when we monitor and manage our model in production.
CaptureTimeThe time the prediction was made.
ImageURL_ClassNameThe public URL of the image that was predicted.
ConfidenceScoreThe probability score of the model.
Count_ClassNameThe number of the same class that was predicted on a given PredictionDate.

List of the filds with a description | Source: Author

To create a DynamoDB table, you will only need two fields: The primary key (partition key) and the secondary key (sort key). Other keys can be added to the table when the application is being tested.

To create the table through the CLI, ensure the CLI is configured with the right account details and use:

aws dynamodb create-table 
    --table-name PredictionsTable 

You can also use the management console or do it programmatically through the Python SDK. To check if the table is active, run:

aws dynamodb describe-table --table-name PredictionsTable | grep TableStatus

Go to and check the available table. If the table was created successfully, you should see:

Successful operation view
Successful operation view | Source: Author

Create an Elastic Container Registry repository

To create an ECR private repository using the CLI, enter the following in your AWS-configured command line replacing `<REGION>` with the region for your project:

aws ecr create-repository 
    --repository-name image-app-repo 
    --image-scanning-configuration scanOnPush=true 
    --region <REGION>

You can also create a private repo from the management console. You should see an output similar to the one below:

Create an Elastic Container Registry repository - output
Output | Source: Author

Check the console to confirm your newly created ECR repository. You should see a similar view:

Confirm the newly created ECR repository
Confirm the newly created ECR repository | Source: Author

Copy your repository URI and save it somewhere, because you will need it in the next section.

Inspect the application code and configuration files

To follow along, ensure you are in your AWS CLI environment (you may also choose to create a virtual environment). Complete the steps below:

  1. On your local computer, create a new folder and name it `image-classification-app`.
  2. In the folder, create a new `requirements.txt` file in that directory. You’ll add some external libraries used by your application that aren’t native to Lambda Python runtime. This function will use the `python3.8` runtime.
  3. Create an `` script that contains the code for the Lambda function and the glue code integrating your function with other services.
  4. Create a `Dockerfile` in the same directory.
  5. Also, create a `buildspec.yml` configuration file that CodeBuild will use to build the Docker image.

Here is what `requirements.txt` looks like:


In this application, we will use TensorFlow 2.4 as our model was built with a Tensorflow 2 version. We will also use pytz to ensure our application time is correctly formatted.

Next, we will take a look at the code for our Lambda function. The full code can be found in this repository. Starting at the lambda function, once an event triggers the function, it runs this script to collect the event details from the S3 bucket containing the images for inference (`images-for-inference`). Also, in case the client uploads an image with the same name, the object will be versioned using `versionId` instead of being overwritten.

def lambda_handler(event, context):

  bucket_name = event['Records'][0]['s3']['bucket']['name']
  key = unquote(event['Records'][0]['s3']['object']['key'])

  versionId = unquote(event['Records'][0]['s3']['object']['versionId'])

Store the class names in a variable and write the code to preprocess the image before feeding it to the model. The input layer of the model expects an image of size 224×224:

  class_names = ['AAW', 'ECLW', 'FAW']
  image = readImageFromBucket(key, bucket_name).resize((224, 224))
  image = image.convert('RGB')
  image = np.asarray(image)
  image = image.flatten()
  image = image.reshape(1, 224, 224, 3)

Make predictions with the model along with the probability score. Get the predicted class name as well:

  prediction = model.predict(image)
  pred_probability = "{:2.0f}%".format(100*np.max(prediction))
  index = np.argmax(prediction[0], axis=-1)
  predicted_class = class_names[index]

After making the prediction, the code checks the DynamoDB table to see if the same prediction has already been made for that day. If it has, the application will update the count of the predicted class for that day and make a new entry with the updated count. Note that if you are planning to store lots of items in this database, then using `table.scan` could be inefficient and costly for you. You might have to find a way to write your logic with `table.query` or some other means such as the `GetItem` and `BatchGetItem` APIs.

  for i in class_names:

    if predicted_class == i: 

      details = table.scan(
          & Attr("ClassName").eq(predicted_class),

      if details['Count'] > 0 and details['Items'][0]['ClassName'] == predicted_class:

        event = max(details['Items'], key=lambda ev: ev['Count_ClassName'])

        current_count = event['Count_ClassName']
        updated_count = current_count + 1
        table_items = table.put_item(
              'PredictionDate': date,
              'ClassPredictionID': predicted_class + "-" + str(uuid.uuid4()), 
              'ClassName': predicted_class,
              'Count_ClassName': updated_count,
              'CaptureTime': time,
              'ImageURL_ClassName': img_url,
              'ConfidenceScore': pred_probability
        print("Updated existing object...")
        return table_items

If this is the first prediction for that day, the code adds it as a fresh item to the table.

      elif details['Count'] == 0:
        new_count = 1
        table_items = table.put_item(
                'PredictionDate': date,
                'ClassPredictionID': predicted_class + "-" + str(uuid.uuid4()), 
                'ClassName': predicted_class,
                'Count_ClassName': new_count,
                'CaptureTime': time,
                'ImageURL_ClassName': img_url,
                'ConfidenceScore': pred_probability
        print("Added new object!")
        return table_items

  print("Updated model predictions successfully!")

This code reads the image from the bucket and returns it as a Pillow image that will be used by the Lambda function:

def readImageFromBucket(key, bucket_name):
  Read the image from the triggering bucket.
  :param key: object key
  :param bucket_name: Name of the triggering bucket.
  :return: Pillow image of the object.

  bucket = s3.Bucket(bucket_name)
  object = bucket.Object(key)
  response = object.get()

Complete File

Your completed code (also in this repository) should look similar to what is below, ensure you replace `<REPLACE WITH YOUR TABLE NAME>` with your DynamoDB table name and `<REPLACE WITH YOUR BUCKET FOR INFERENCE IMAGES>` with the S3 bucket you created earlier:

import json
import boto3
import datetime
import numpy as np
import PIL.Image as Image
import uuid
import pytz
import tensorflow as tf

from urllib.parse import unquote
from pathlib import Path
from decimal import Decimal
from botocore.exceptions import ClientError
from boto3.dynamodb.conditions import Key, Attr
from datetime import datetime as dt

tz = pytz.timezone('Africa/Lagos') 
date_time = str('%Y-%m-%d %H:%M:%S'))

date = date_time.split(" ")[0]
time = date_time.split(" ")[1]

timestamp = Decimal(str(dt.timestamp(

import_path = "model/"

model = tf.keras.models.load_model(import_path)



s3 = boto3.resource('s3')
dynamodb = boto3.resource('dynamodb')

table = dynamodb.Table('<REPLACE WITH YOUR TABLE NAME>')

def lambda_handler(event, context):

  bucket_name = event['Records'][0]['s3']['bucket']['name']
  key = unquote(event['Records'][0]['s3']['object']['key'])

  versionId = unquote(event['Records'][0]['s3']['object']['versionId'])

  class_names = ['AAW', 'ECLW', 'FAW']
  image = readImageFromBucket(key, bucket_name).resize((224, 224))
  image = image.convert('RGB')
  image = np.asarray(image)
  image = image.flatten()
  image = image.reshape(1, 224, 224, 3)

  prediction = model.predict(image)
  pred_probability = "{:2.0f}%".format(100*np.max(prediction))
  index = np.argmax(prediction[0], axis=-1)
  predicted_class = class_names[index]

  print('ImageName: {0}, Model Prediction: {1}'.format(key, predicted_class))

  img_url = f"https://<REPLACE WITH YOUR BUCKET FOR INFERENCE IMAGES>.s3.<REGION>{key}?versionId={versionId}"


  for i in class_names:

    if predicted_class == i: 

      details = table.scan(
          & Attr("ClassName").eq(predicted_class),

      if details['Count'] > 0 and details['Items'][0]['ClassName'] == predicted_class:

        event = max(details['Items'], key=lambda ev: ev['Count_ClassName'])

        current_count = event['Count_ClassName']
        updated_count = current_count + 1
        table_items = table.put_item(
              'PredictionDate': date,
              'ClassPredictionID': predicted_class + "-" + str(uuid.uuid4()), 
              'ClassName': predicted_class,
              'Count_ClassName': updated_count,
              'CaptureTime': time,
              'ImageURL_ClassName': img_url,
              'ConfidenceScore': pred_probability
        print("Updated existing object...")
        return table_items
      elif details['Count'] == 0:
        new_count = 1
        table_items = table.put_item(
                'PredictionDate': date,
                'ClassPredictionID': predicted_class + "-" + str(uuid.uuid4()), 
                'ClassName': predicted_class,
                'Count_ClassName': new_count,
                'CaptureTime': time,
                'ImageURL_ClassName': img_url,
                'ConfidenceScore': pred_probability
        print("Added new object!")
        return table_items

  print("Updated model predictions successfully!")

def readImageFromBucket(key, bucket_name):
  Read the image from the triggering bucket.
  :param key: object key
  :param bucket_name: Name of the triggering bucket.
  :return: Pillow image of the object.

  bucket = s3.Bucket(bucket_name)
  object = bucket.Object(key)
  response = object.get()

Inspect the `Dockerfile` and replace `<YOUR NAME>` with your name and `<YOUR EMAIL>` with your email.


LABEL version="1.0"
LABEL description="Demo moth classification application for serverless deployment for technical guide."

RUN yum -y install tar gzip zlib freetype-devel 
    && yum clean all

COPY requirements.txt ./

RUN python3.8 -m pip install -r requirements.txt

RUN pip uninstall -y pillow && CC="cc -mavx2" pip install -U --force-reinstall pillow-simd


RUN mkdir model
RUN curl -L -o ./model/resnet.tar.gz
RUN tar -xf model/resnet.tar.gz -C model/
RUN rm -r model/resnet.tar.gz

CMD ["app.lambda_handler"]

Inspect the `buildspec.yml` file and replace `<REPLACE WITH YOUR ECR REPO URI>`  with the URI of the ECR repository you created in the previous section:

version: 0.2

      python: 3.8
      - echo Logging in to Amazon ECR...
      - pip install --upgrade awscli==1.18.17
      - aws --version
      - $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - IMAGE_TAG=build-$(echo $CODEBUILD_BUILD_ID | awk -F":" '{print $2}')
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $REPOSITORY_URI:latest .
      - docker tag $REPOSITORY_URI:latest $REPOSITORY_URI:$IMAGE_TAG
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $REPOSITORY_URI:latest
      - docker push $REPOSITORY_URI:$IMAGE_TAG

That’s it! Ensure you compare your code with one in the repository for this guide. We will now set up the CI/CD pipeline to push our application code and config files.

Create CodeCommit Git repository

1. Follow the instructions on this page to create your git credentials for CodeCommit.

2. Once you have created your git credentials, open the CodeCommit console at

3. Click on Create repository:

Create repository
Create repository | Source: Author

4. Enter a Repository name, a description, leave the rest of the options as default and click Create:

Add repository details
Add repository details | Source: Author

Confirm your newly created repo on the page that shows up and check out the information on how to clone your repo with git. You should see the following:

Clone your repo with git
Clone your repo with git | Source: Author

5. To make your first commit, ensure you are in the proper folder, the one which you created earlier in the guide `image-classification-app` and enter the following (assuming you followed the naming conventions in this guide and are using the same region):

git clone ../image-classification-app/

If successfully cloned, you should now have your code ready to be committed and pushed to the CI/CD pipeline.

6. Inside the `image-classification-app` folder, check to see what files have not been committed yet using:

git status

7. Stage all the files you want to commit in the folder:

git add *

8. Check to see if the files are ready to be committed:

git status

You should see a similar output:

Create CodeCommit Git repository - output
Output | Source: Author

9. Optionally, you may be required to configure your name for this commit, you can replace the tags below with your details:

git config "<YOUR EMAIL ADDRESS>"
git config  "<YOUR NAME>"

10. Commit the files and include a message.

git commit -m "Initial commit to CodeCommit."

You should see a similar output:

Create CodeCommit Git repository - output

Now your code and configuration files are ready to be pushed to the CodeCommit repository you created earlier. To push them to the repository, follow the instructions below:

11. Use the following command to push your commits to the master branch upstream:

git push -u origin master

Your command line or IDE may require you to enter the git credentials you created earlier. Ensure you enter the correct details. If your commits were pushed successfully, you should see a similar output:

Create CodeCommit Git repository - output

Build container image with CodeBuild

To build our container image, we will have to create a project with CodeBuild.

1. Head over to

2. Select Build project and click Create build project:

Create build project
Create build project | Source: Author

3. Under Project configuration, enter your project name and the description of your project:

Add project details
Add project details | Source: Author

4. Under Source, confirm AWS CodeCommit is selected, and select your source repository (`image-app-repo`). Under Reference type, select Branch, and choose the master branch.

Add source details
Add source details | Source: Author

5. Under Environment, ensure Managed image is selected. Under Operating system, select Ubuntu. Select the standard runtime and choose the standard:5.0 image. Under Image version, ensure Always use the latest image for this runtime version is selected. Under Environment type, select Linux and click the checkbox under Privileged since you are building a Docker image:

Add environment details
Add environment details | Source: Author

6. If you do not have an existing service role, ensure New service role is selected and leave others here as the default unless you want additional configurations:

Select new service role
Select new service role | Source: Author

7. Under Buildspec, ensure Use a buildspec file is selected. Leave others on default, including the options under Batch configuration:

Ensure use a buildspec file is selected
Ensure use a buildspec file is selected | Source: Author

8. You don’t have any artifact included in your application so you can leave the options under Artifacts as their default. Under Logs, ensure CloudWatch logs is checked. Enter a descriptive group name and stream name. Confirm all your options and select Create build project:

Create build project
Create build project | Source: Author

If your project was created successfully, you should see a page similar to this:

Successful new project view
Successful new project view | Source: Author

If you want to get notified when a build is complete, you can click on Create a notification rule for this project.

9. Before starting the build for your project, you need to attach a new policy to the CodeBuild role you created. Head over to your IAM page, click Role and search for the codebuild role you created:

IAM page
Attach new policy | Source: Author

10. Click on Attach Policies

Attach policies
Select attached policies | Source: Author

11. Search for the AmazonEC2ContainerRegistryPowerUser policy. Click on the checkmark next to it and at the end of the page, click Attach policy:

Attach AmazonEC2ContainerRegistryPowerUser policy
Attach AmazonEC2ContainerRegistryPowerUser policy | Source: Author

12. On the next page, confirm the policy has indeed been added:

Confirm attached policies
Confirm attached policies | Source: Author

13. Go back to the CodeBuild page for the project you created and click on Start build to test your application build. If your application build was successful, you should see a page similar to this:

Successful build status view
Successful build status view | Source: Author

If your build fails, the logs are pretty helpful for troubleshooting. If you followed the steps in this guide correctly, you should have a successful application build. You can go back to the Elastics Container Registry page for `image-app-repo` (or the ECR repo you created) to confirm a Docker image was indeed created.

Automate application build with CodePipeline

You don’t want to rely on manual steps to deploy the serverless components. To automate the build process when you push a new commit to CodeCommit, you can use CodePipeline.

1. Go to the CodePipeline management console

2. Ensure Pipelines is selected and click on Create pipeline:

Create pipeline
Create pipeline | Source: Author

3. Enter your pipeline name. If you do not have an existing service role, select New service role and enter a role name. Leave others as default and click on Next:

Choose pipeline settings
Choose pipeline settings | Source: Author

4. In the next step, select AWS CodeCommit, enter a repository name, select the master branch and leave other options as default:

Add source stage
Add source stage | Source: Author

5. Under the Add build stage step, select AWS CodeBuild as the build provider. Ensure you select your project region and the CodeBuild project you created in the previous section. Leave other options as default and click Next:

Add build stage
Add build stage | Source: Author

6. Because we are going to deploy our image manually we will have to skip the continuous deployment step in the pipeline. Click Skip deploy stage and when the Skip deployment stage dialog box pops up, click on Skip:

Add deploy stage
Add deploy stage | Source: Author

7. Finally, review your settings and click Create pipeline:

Create pipeline
Create pipeline | Source: Author

8. You should now be on a page where your application build process has automatically started:

Successful new pipeline view
Successful new pipeline view | Source: Author

Check back in 5 minutes and if the build was successful, you should see a Succeeded notification.

Now your continuous integration and delivery pipeline is complete. You can now make a final commit of your application source code to ensure your pipeline works as intended.

Ensure you are still in your project folder and if you did not change anything in your code, stage your code again and add a new commit message:

git add *
git commit -m "Final commit"

Push your commit to CodeCommit using:

git push

You may be required to enter your CodeCommit git credentials. Ensure you enter the correct details.

Once the push to the master branch is successful, go back to the CodePipeline page and confirm your push has triggered a new build in the pipeline:

Confirm your push has triggered a new build in the pipeline
Confirm your push has triggered a new build in the pipeline | Source: Author

Once the build is done, go to the `image-app-repo` (your repo) in Elastics Container Registry to see the new image build:

Open image-app-repo
Open image-app-repo | Source: Author

Great! You are now ready to deploy. Since this is a tutorial, we will skip most of the testing stage for security and other vulnerabilities. If this is a production application for your organization, you should definitely consider doing security checks. You do not want to hardcode your credentials or API keys in your CI/CD pipeline as well. You may use a credential management tool like AWS Parameter Store instead.

Deploying your image classification application to AWS Lambda

Now your application build is all set to be deployed to a serverless function. To deploy it, we have to set up AWS Lambda to run the application build whenever there is a new event. In our case, whenever a new image is uploaded to our S3 bucket, it should trigger the Lambda function to run the app on the image and return the prediction and other details to DynamoDB.

Create Serverless Function with AWS Lambda

1. Open the Functions page on the Lambda console.

2. Choose Create function.

Create function
Create function | Source: Author

3. Under Create Function, choose Container image. Under Basic information, enter the name of your function, choose the image in the ECR repository you created, and ensure x86 is selected as the microprocessor. Leave other options as default and click Create function:

Add function details
Add function details | Source: Author

4. Once your image is created, under Function overview, click on Add trigger:

Open add trigger view
Open add trigger view | Source: Author

5. Under the Trigger configuration, select S3 and the bucket you want to trigger the function. Leave other options as default (assuming you don’t have any folder or need to include specific extension names). Check the box under Recursive invocation to acknowledge the information:

Add trigger
Add trigger | Source: Author

After the trigger is added, you need to allow the Lambda function to connect to the S3 bucket by setting the appropriate AWS Identity and Access Management (IAM) rights for its execution role.

6. On the Permissions tab for your function, choose the IAM role:

Select IAM role
Select IAM role | Source: Author

7. Choose Attach policies:

Select attached policies | Source: Author

8. Search for AmazonS3ReadOnlyAccess and AmazonDynamoDBFullAccess. Attach both policies to the IAM role:

Attach AmazonS3ReadOnlyAccess and AmazonDynamoDBFullAccess policies
Attach AmazonS3ReadOnlyAccess and AmazonDynamoDBFullAccess policies | Source: Author

9. Go back to your Function page. Under Configuration tab, ensure General configuration is selected and click on Edit:

Edit general configuration
Edit general configuration | Source: Author

10. Upgrade the memory size to 7000 MB (7 GB) to ensure your application has enough memory to utilize when it runs. Also, crank up the Timeout to about 5 minutes. Leave other options as default and click Save

Edit basic settings
Edit basic settings | Source: Author

That’s it! You are now ready to test your application.

Testing your application

To test your application, upload a test image to your inference bucket. In our case, it is `images-for-inference`. Here’s an example of a Fall armyworm (FAW) image you can use to test this application if you are following along:

Image of Fall armyworm (FAW)
Image of Fall armyworm (FAW) | Source: Author

When you upload the image to the bucket, wait for a few minutes for your application to start (the cold-start problem we discussed earlier). Go to the page for your Lambda function. In our case, it’s `image-app-func`. Under the Monitor tab, click on View logs in CloudWatch. Check the latest log stream and see your application logs:

Check the latest log stream and see your application logs
Check the latest log stream and see your application logs | Source: Author

You can see that the application returned the correct prediction and it also notified us that it has added a new object to the database because there was no existing object. The turnaround time was 48705ms (or 48.75 seconds). If you are planning to run a real-time application, this might be unacceptable. Once you run other predictions within 5 minutes of the previous prediction, the latency should be significantly lesser and more suitable for real-time tasks.

Go to the DynamoDB table you created earlier and check to confirm if a new item was added to the table:

Open DynamoDB table
Open DynamoDB table | Source: Author

Good job! Your app is now working as expected. You can check this repository for a few more test images.

If you plan on stopping here, ensure you delete the previous Image builds in ECR to avoid getting billed for them. You won’t need to pay for other services as they are pay-as-you-use.

Next steps

Congratulations on making it to the end of this guide! As a next step, you might want to connect API Gateway, the serverless API management service to your table so your application can get the results from the table or even delete irrelevant results. An architectural pattern similar to the one below:

Deployment workflow for serverless ML application
Deployment workflow for serverless ML application | Source: Author

If you are interested in exploring this, you can find a sample code for Lambda function that will connect API Gateway with GET and DELETE methods to the DynamoDB table in this repository.

Applying the same architectural pattern with Google Cloud serverless

If you choose to deploy your image classification application to Google Cloud serverless instead of AWS, the great news is that this architectural pattern can be applied to most of the serverless services on GCP, using the following tools:

All these services integrate well together and using the architectural pattern from this guide, you can build a similar application on Google Cloud.

Applying the same architectural pattern with Azure serverless

If you opt to deploy your image classification application to Azure serverless, this architectural pattern can be replicated here as well with the help of the following services:


This has been a lengthy technical guide focused on the most optimal serverless pattern to deploy an image classification application. Needless to mention, this architecture can work for other types of ML applications as well that involve unstructured data – with some tweaks, of course. 

To conclude this, here are some of the best practices you should follow in using serverless for your ML apps:

  • Limit dependence on packages because in most cases, the more dependencies a function has, the slower the startup time is, in addition to complexities in terms of managing your application.
  • Try to avoid long-running functions for your application. If your app is complex, decompose it into different functions and couple them loosely.
  • It might be helpful to send and receive data in batches. With serverless functions, you get better performance when you instantiate a function with batch data. For example, rather than sending in images to your application as they come from the client, you may want to store them in an S3 bucket instead and only trigger the function at certain intervals or when a new set of images are uploaded to the bucket. 
  • Consider the ecosystem of tools available in your platform of choice for robust tracing, monitoring, auditing, and troubleshooting your applications sitting in serverless environments.
  • Load test your application before deploying it to a live environment. This is especially crucial for serverless ML applications.
  • Consider if features for selecting deployment strategies such as blue-green deployment, A/B testing, and canary deployment are available in your platform of choice and use them in your deployment workflow.

References and resources


12 mins read | Jakub Czakon | Updated August 25th, 2021

In one of our articles—The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Actually Use – Things We Learned from 41 ML Startups—Jean-Christophe Petkovich, CTO at Acerta, explained how their ML team approaches MLOps.

According to him, there are several ingredients for a complete MLOps system:

  • You need to be able to build model artifacts that contain all the information needed to preprocess your data and generate a result. 
  • Once you can build model artifacts, you have to be able to track the code that builds them, and the data they were trained and tested on. 
  • You need to keep track of how all three of these things, the models, their code, and their data, are related. 
  • Once you can track all these things, you can also mark them ready for staging, and production, and run them through a CI/CD process. 
  • Finally, to actually deploy them at the end of that process, you need some way to spin up a service based on that model artifact. 

It’s a great high-level summary of how to successfully implement MLOps in a company. But understanding what is needed in high-level is just a part of the puzzle. The other one is adopting or creating proper tooling that gets things done. 

That’s why we’ve compiled a list of the best MLOps tools. We’ve divided them into six categories so you can choose the right tools for your team and for your business. Let’s dig in!

Continue reading ->

Source link

Spread the Word!