Running your AI Workload through Azure Container Apps
For a new project I am working on, I am utilizing a Queuing architecture where jobs come in and a cluster of containers is processing work on this queue. Each job takes an item from a queue, runs the item through its AI pipeline and spits out a JSON response.
Setting up an infrastructure to accomodate the above is quite "interesting" as it requires:
- Creating a Kubernetes cluster
- Creating and hosting a queueing solution
- Autoscaling your cluster on a hardware level depending on the items in your queue
- Allowing Microservice cross invocation
Typically we would solve the above by utilizing a Queue (e.g. RabbitMQ), a Kubernetes cluster that autoscales (e.g. with with KEDA) and an Image that pulls from this queue in an abstracted way (e.g. with Dapr) with additionally cross Microservice invocation with Dapr.
In the above, the main annoying part stays: "Scaling your hardware to a 0-usage based on a queue". KEDA makes this easy, but it's still annoying and requires some work.
Luckily this is where "Container Apps" comes in!
Let's see how we can utilize Container Apps to expose the YoloV5 model through an API.
Building a YoloV5 Container with API
Code
Create a main.py
file with the following code, that will pull the model from Torch Hub and run the model on port 5000 with endpoint /v1/object-detection/yolov5s
import argparse
import io
from PIL import Image
import torch
from flask import Flask, request
app = Flask(__name__)
@app.route("/", methods=["GET"])
def hello():
return {
"hello": "world"
}
@app.route("/v1/object-detection/yolov5s", methods=["POST"])
def predict():
if not request.method == "POST":
return
if request.files.get("image"):
image_file = request.files["image"]
image_bytes = image_file.read()
img = Image.open(io.BytesIO(image_bytes))
results = model(img, size=640)
data = results.pandas().xyxy[0].to_json(orient="records")
return data
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Flask api exposing yolov5 model")
parser.add_argument("--port", default=5000, type=int, help="port number")
args = parser.parse_args()
model = torch.hub.load("ultralytics/yolov5", "yolov5s", pretrained=True, force_reload=True).autoshape()
model.eval()
app.run(host="0.0.0.0", port=args.port)
As a last piece of code, we add our requirements.txt
:
flask
requests
black
matplotlib>=3.2.2
numpy>=1.18.5
opencv-python>=4.1.2
Pillow
PyYAML>=5.3.1
scipy>=1.4.1
torch>=1.7.0
torchvision>=0.8.1
tqdm>=4.41.0
tensorboard>=2.4.1
seaborn>=0.11.0
pandas
thop # FLOPs computation
Packaging it
Finally, we can package the above as a container by creating a Dockerfile
with our content:
FROM python:3.8-slim-buster
RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6 -y
WORKDIR /app
ADD requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
WORKDIR /app
ADD . /app
EXPOSE 5000
CMD ["python", "main.py", "--port=5000"]
This can build the container by running:
docker build -t api-yolov5 .
To make things easier, I have already pushed this container to the docker hub.
Creating our Azure Container Apps
Now our container has been created, we can get started hosting it! To do this, let's first configure our Azure Container App access in the CLI and then publish our container above.
Prerequisites
First configure our AZ CLI to include the extension and be logged in:
# Login
az login
az account set --subscription <id>
# Add CLI Extensions
az extension add \
--source https://workerappscliextension.blob.core.windows.net/azure-cli-extension/containerapp-0.2.0-py2.py3-none-any.whl
# Register the Web Provider
az provider register --namespace Microsoft.Web
Create
RESOURCE_GROUP="demo"
LOCATION="northeurope"
LOG_ANALYTICS_WORKSPACE="my-container-apps-logs"
CONTAINERAPPS_ENVIRONMENT="my-environment"
az group create --name $RESOURCE_GROUP --location "$LOCATION"
az monitor log-analytics workspace create \
--resource-group $RESOURCE_GROUP \
--workspace-name $LOG_ANALYTICS_WORKSPACE
# Get Log Analytics Client ID and Client Secret
LOG_ANALYTICS_WORKSPACE_CLIENT_ID=`az monitor log-analytics workspace show --query customerId -g $RESOURCE_GROUP -n $LOG_ANALYTICS_WORKSPACE --out tsv`
LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET=`az monitor log-analytics workspace get-shared-keys --query primarySharedKey -g $RESOURCE_GROUP -n $LOG_ANALYTICS_WORKSPACE --out tsv`
# Create Container App Environment
# this is where all the container apps are deployed
az containerapp env create \
--name $CONTAINERAPPS_ENVIRONMENT \
--resource-group $RESOURCE_GROUP \
--logs-workspace-id $LOG_ANALYTICS_WORKSPACE_CLIENT_ID \
--logs-workspace-key $LOG_ANALYTICS_WORKSPACE_CLIENT_SECRET \
--location "$LOCATION"
# Create our Container App
az containerapp create \
--name my-container-app \
--resource-group $RESOURCE_GROUP \
--environment $CONTAINERAPPS_ENVIRONMENT \
--image thebillkidy/api-yolov5:latest \
--target-port 5000 \
--ingress 'external' \
--query configuration.ingress.fqdn
Testing it
Once we ran the last command (az containerapp create
) we will get a URL back that looks like this: my-container-app.unique_id.region.azurecontainerapps.io
. When we call it in the browser, we will get:
{
"hello": "world"
}
Which means it is up and running! Executing the inference endpoint now, we can run:
curl -X POST -F image=@demo.jpg 'https://my-container-app.unique_id.region.azurecontainerapps.io/v1/object-detection/yolov5s'
The above will send the local demo.jpg
picture to our endpoint and return the infered result, which in this case shows that we have 4 chairs, 4 potted plants, 2 persons and 1 couch on this image!
[
{"xmin":103.2421798706,"ymin":292.3749694824,"xmax":437.8124694824,"ymax":615.8124389648,"confidence":0.8725585938,"class":56,"name":"chair"},
{"xmin":774.8436889648,"ymin":246.4374847412,"xmax":1126.8748779297,"ymax":567.5311889648,"confidence":0.841796875,"class":56,"name":"chair"},
{"xmin":0.3515624702,"ymin":244.3281097412,"xmax":150.3515472412,"ymax":483.6249389648,"confidence":0.8344726562,"class":56,"name":"chair"},
{"xmin":752.3436889648,"ymin":97.0234298706,"xmax":821.7186889648,"ymax":220.5390472412,"confidence":0.8227539062,"class":0,"name":"person"},
{"xmin":536.2499389648,"ymin":107.1015548706,"xmax":659.0624389648,"ymax":210.4609222412,"confidence":0.7124023438,"class":0,"name":"person"},
{"xmin":356.0155944824,"ymin":218.4296722412,"xmax":984.3749389648,"ymax":491.1249389648,"confidence":0.6181640625,"class":57,"name":"couch"},
{"xmin":85.8984298706,"ymin":154.0937347412,"xmax":185.5077972412,"ymax":284.8749694824,"confidence":0.5854492188,"class":58,"name":"potted plant"},
{"xmin":674.5311889648,"ymin":121.9843673706,"xmax":729.8436889648,"ymax":205.1874847412,"confidence":0.384765625,"class":58,"name":"potted plant"},
{"xmin":233.5546722412,"ymin":220.0702972412,"xmax":367.0312194824,"ymax":321.9062194824,"confidence":0.2978515625,"class":56,"name":"chair"},
{"xmin":440.3905944824,"ymin":0.0,"xmax":626.7186889648,"ymax":173.1952972412,"confidence":0.2956542969,"class":58,"name":"potted plant"},
{"xmin":0.9082030654,"ymin":146.59375,"xmax":44.5898399353,"ymax":170.0312347412,"confidence":0.2568359375,"class":58,"name":"potted plant"}
]
Conclusion
It has never been easier to create a workload that dynamically scales based on the incoming requests! In a next post, I will explain how we can combine this with Dapr to get items from a queue instead of a HTTP endpoint.