Running your AI Workload through Azure Container Apps

For a new project I am working on, I am utilizing a Queuing architecture where jobs come in and a cluster of containers is processing work on this queue. Each job takes an item from a queue, runs the item through its AI pipeline and spits out a JSON response.

Setting up an infrastructure to accomodate the above is quite "interesting" as it requires:

Creating a Kubernetes cluster
Creating and hosting a queueing solution
Autoscaling your cluster on a hardware level depending on the items in your queue
Allowing Microservice cross invocation

Typically we would solve the above by utilizing a Queue (e.g. RabbitMQ), a Kubernetes cluster that autoscales (e.g. with with KEDA) and an Image that pulls from this queue in an abstracted way (e.g. with Dapr) with additionally cross Microservice invocation with Dapr.

In the above, the main annoying part stays: "Scaling your hardware to a 0-usage based on a queue". KEDA makes this easy, but it's still annoying and requires some work.

Luckily this is where "Container Apps" comes in!

Let's see how we can utilize Container Apps to expose the YoloV5 model through an API.

Building a YoloV5 Container with API

Code

Create a main.py file with the following code, that will pull the model from Torch Hub and run the model on port 5000 with endpoint /v1/object-detection/yolov5s

import argparse
import io
from PIL import Image

import torch
from flask import Flask, request

app = Flask(__name__)

@app.route("/", methods=["GET"])
def hello():
    return {
        "hello": "world"
    }

@app.route("/v1/object-detection/yolov5s", methods=["POST"])
def predict():
    if not request.method == "POST":
        return

    if request.files.get("image"):
        image_file = request.files["image"]
        image_bytes = image_file.read()

        img = Image.open(io.BytesIO(image_bytes))

        results = model(img, size=640)
        data = results.pandas().xyxy[0].to_json(orient="records")
        return data


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Flask api exposing yolov5 model")
    parser.add_argument("--port", default=5000, type=int, help="port number")
    args = parser.parse_args()

    model = torch.hub.load("ultralytics/yolov5", "yolov5s", pretrained=True, force_reload=True).autoshape()
    model.eval()
    app.run(host="0.0.0.0", port=args.port)

As a last piece of code, we add our requirements.txt:

flask
requests
black

matplotlib>=3.2.2
numpy>=1.18.5
opencv-python>=4.1.2
Pillow
PyYAML>=5.3.1
scipy>=1.4.1
torch>=1.7.0
torchvision>=0.8.1
tqdm>=4.41.0

tensorboard>=2.4.1

seaborn>=0.11.0
pandas

thop  # FLOPs computation

Packaging it

Finally, we can package the above as a container by creating a Dockerfile with our content:

FROM python:3.8-slim-buster

RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6  -y

WORKDIR /app
ADD requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt

WORKDIR /app
ADD . /app

EXPOSE 5000

CMD ["python", "main.py", "--port=5000"]

This can build the container by running:

docker build -t api-yolov5 .

To make things easier, I have already pushed this container to the docker hub.

Creating our Azure Container Apps

Now our container has been created, we can get started hosting it! To do this, let's first configure our Azure Container App access in the CLI and then publish our container above.

Prerequisites

First configure our AZ CLI to include the extension and be logged in:

# Login
az login
az account set --subscription <id>

# Add CLI Extensions
az extension add \
  --source https://workerappscliextension.blob.core.windows.net/azure-cli-extension/containerapp-0.2.0-py2.py3-none-any.whl 

# Register the Web Provider
az provider register --namespace Microsoft.Web

Create

RESOURCE_GROUP="demo"
LOCATION="northeurope"
LOG_ANALYTICS_WORKSPACE="my-container-apps-logs"
CONTAINERAPPS_ENVIRONMENT="my-environment"

az group create --name $RESOURCE_GROUP --location "$LOCATION"

az monitor log-analytics workspace create \
  --resource-group $RESOURCE_GROUP \
  --workspace-name $LOG_ANALYTICS_WORKSPACE

# Get Log Analytics Client ID and Client Secret
LOG_ANALYTICS_WORKSPACE_CLIENT_ID=`az monitor log-analytics workspace show --query customerId -g $RESOURCE_GROUP -n $LOG_ANALYTICS_WORKSPACE --out tsv`
LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET=`az monitor log-analytics workspace get-shared-keys --query primarySharedKey -g $RESOURCE_GROUP -n $LOG_ANALYTICS_WORKSPACE --out tsv`

# Create Container App Environment
# this is where all the container apps are deployed
az containerapp env create \
  --name $CONTAINERAPPS_ENVIRONMENT \
  --resource-group $RESOURCE_GROUP \
  --logs-workspace-id $LOG_ANALYTICS_WORKSPACE_CLIENT_ID \
  --logs-workspace-key $LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET \
  --location "$LOCATION"

# Create our Container App
az containerapp create \
  --name my-container-app \
  --resource-group $RESOURCE_GROUP \
  --environment $CONTAINERAPPS_ENVIRONMENT \
  --image thebillkidy/api-yolov5:latest \
  --target-port 5000 \
  --ingress 'external' \
  --query configuration.ingress.fqdn

Testing it

Once we ran the last command (az containerapp create) we will get a URL back that looks like this: my-container-app.unique_id.region.azurecontainerapps.io. When we call it in the browser, we will get:

{
    "hello": "world"
}

Which means it is up and running! Executing the inference endpoint now, we can run:

curl -X POST -F image=@demo.jpg 'https://my-container-app.unique_id.region.azurecontainerapps.io/v1/object-detection/yolov5s'

The above will send the local demo.jpg picture to our endpoint and return the infered result, which in this case shows that we have 4 chairs, 4 potted plants, 2 persons and 1 couch on this image!

[
    {"xmin":103.2421798706,"ymin":292.3749694824,"xmax":437.8124694824,"ymax":615.8124389648,"confidence":0.8725585938,"class":56,"name":"chair"},
    {"xmin":774.8436889648,"ymin":246.4374847412,"xmax":1126.8748779297,"ymax":567.5311889648,"confidence":0.841796875,"class":56,"name":"chair"},
    {"xmin":0.3515624702,"ymin":244.3281097412,"xmax":150.3515472412,"ymax":483.6249389648,"confidence":0.8344726562,"class":56,"name":"chair"},
    {"xmin":752.3436889648,"ymin":97.0234298706,"xmax":821.7186889648,"ymax":220.5390472412,"confidence":0.8227539062,"class":0,"name":"person"},
    {"xmin":536.2499389648,"ymin":107.1015548706,"xmax":659.0624389648,"ymax":210.4609222412,"confidence":0.7124023438,"class":0,"name":"person"},
    {"xmin":356.0155944824,"ymin":218.4296722412,"xmax":984.3749389648,"ymax":491.1249389648,"confidence":0.6181640625,"class":57,"name":"couch"},
    {"xmin":85.8984298706,"ymin":154.0937347412,"xmax":185.5077972412,"ymax":284.8749694824,"confidence":0.5854492188,"class":58,"name":"potted plant"},
    {"xmin":674.5311889648,"ymin":121.9843673706,"xmax":729.8436889648,"ymax":205.1874847412,"confidence":0.384765625,"class":58,"name":"potted plant"},
    {"xmin":233.5546722412,"ymin":220.0702972412,"xmax":367.0312194824,"ymax":321.9062194824,"confidence":0.2978515625,"class":56,"name":"chair"},
    {"xmin":440.3905944824,"ymin":0.0,"xmax":626.7186889648,"ymax":173.1952972412,"confidence":0.2956542969,"class":58,"name":"potted plant"},
    {"xmin":0.9082030654,"ymin":146.59375,"xmax":44.5898399353,"ymax":170.0312347412,"confidence":0.2568359375,"class":58,"name":"potted plant"}
]

Conclusion

It has never been easier to create a workload that dynamically scales based on the incoming requests! In a next post, I will explain how we can combine this with Dapr to get items from a queue instead of a HTTP endpoint.