Auto scaling a HTTP triggered application in Kubernetes using Keda

Xavier Geerinck

October 21, 2020 / ai ai-ml

Kubernetes is getting more popular everyday and it’s no wonder why! When you are running applications on-premise or in-cloud, the possibility of having the applications in a portable way is a strong one! Removing the friction for scaling-out your application when you are ready for it, or even bursting scenarios.

Use Case

For a use case I have been working on, I have the following requirements:

  • 1 HTTP request = 1 Container
  • Instance can be up to 1GB of memory
  • No pub/sub, but synchronous HTTP request/response!
  • Requests execution time can take up to 60 seconds
  • I do not want to utilize a pool, but have an auto-scaling system in place

Possible solutions

As in any project, there are multiple paths towards a suitable solution. After researching this a bit, I came across the following tools that looked promising:

  • Knative

    • This felt like over-kill. It’s quite complex to install (though there are tools such as knctl but it’s outdated) and is quite heavy on the cluster it seems.
  • Keda

    • Super interesting auto scaler, but no HTTP based scaling supported (it’s event driven) which I would need.
  • Fission

    • Interesting! It however seems to keep images in memory (which would be 1GB in memory all the time for us) and auto-scaling is similar to Keda in this sense (it even supports KEDA)

Looking at the above, I was kind of disappointed since no solutions “seemed” to exist. But after a bit more of experimenting and exploration I came across a blog post by Anirudh Garg explaining how you can hook up an Nginx Ingress controller to Keda!

Now the above does solves my issue completely, seeing that I want to transform an async request into a synchronous request (i.e. I want to to get my HTTP request, auto scale the cluster and then return the response). Thus a custom component will have to be written in my case. The blog post however goes into how KEDA can be utilized, which seems to be an excellent fit for me!

Architecture

As always, the most important when creating a new application is to at least draw out a high-level architecture of how the idea will look like. It might sound like a boring and unnecessary job, but I think it helps me a lot in doing a thought excercise that makes me think things through more before wasting time on trying things that won’t work. Even if it doesn’t work out correctly, you atleast tried 😉

Architecture

Note: you can find the full source code utilised in this post here: https://github.com/XavierGeerinck/PublicProjects/tree/master/JS/Dapr/AutoScalingHTTP

Solution (Gateway and Worker)

Used Technologies

As for technologies, I decided to settle on the following:

  • Dapr
  • NodeJS

    • Express
  • RabbitMQ
  • Kubernetes
  • KEDA

Prerequisites

Before we get started, make sure the following is available:

  • Helm
  • Kubernetes cluster with kubectl mapped to it (Note: locally I just run minikube start --cpus 2 --memory 2048 and I’m ready to go)

Installing RabbitMQ

Once Dapr has been installed, we want to set up our queue. As shown in the architecture, we will have an inbound application that puts items on a queue. As soon as these items have been published, a worker will take them and process what it has to do (in our case, returning “Hello World” after a small delay).

To install RabbitMQ, run the following commands:

# ref: https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq/#parameters
# Install repo
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# Install RabbitMQ
helm install rabbitmq --set auth.username=admin,auth.password=admin,volumePermissions.enabled=true bitnami/rabbitmq

Note: The default username is “user” and the default hostname is “rabbitmq.default.svc.cluster.local:5672” as dictated by the Kubernetes pod defaults of “SVCNAME.NAMESPACE.cluster.local”. The moniker then becomes `amqp://user:YOURPASSWORD@rabbitmq.default.svc.cluster.local:5672`.

Important: When you created one before, make sure to delete the old PersistentVolumeClaim else the password won’t work (kubectl get pvc)

Installing Dapr

The first thing we want to do once we have a Kubernetes cluster ready is to deploy and install Dapr on it. We can do this by running the following command:

# Add helm repo
helm repo add dapr https://dapr.github.io/helm-charts/
helm repo update

# Create namespace dapr-system
kubectl create namespace dapr-system

# Install latest dapr version
helm install dapr dapr/dapr --namespace dapr-system

Installing Dapr Bindings for RabbitMQ

Once Dapr and RabbitMQ have been installed, we can create bindings that will allow us to receive RabbitMQ events on a specific endpoint, as well as send events to RabbitMQ through calling a HTTP endpoint.

For this we create the following .YAML files and apply them on our kubernetes cluster with kubectl apply -f binding-rabbitmq-in.yaml and kubectl apply -f binding-rabbitmq-out.yaml

binding-rabbitmq-in.yaml

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: rabbitmq-worker-input
  namespace: default
spec:
  type: bindings.rabbitmq
  metadata:
  - name: host
    value: amqp://user:YOUR_PASSWORD@rabbitmq.default.svc.cluster.local:5672
  - name: queueName
    value: rabbitmq-worker-input

binding-rabbitmq-out.yaml

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: rabbitmq-worker-output
  namespace: default
spec:
  type: bindings.rabbitmq
  metadata:
  - name: host
    value: amqp://user:YOUR_PASSWORD@rabbitmq.default.svc.cluster.local:5672
  - name: queueName
    value: rabbitmq-worker-output

Creating our Gateway

On to the hardest part! The gateway! If we take a look again at our architecture, we can see that 2 servers are required here. The reason being that we want to be able to divide external traffic (user request) from the internal Dapr traffic.

architecture gateway

We thus create 2 files named src/server-dapr.ts and src/server-external.ts that will handle user traffic and internal traffic.

Note: For the full source code, feel free to check the repository https://github.com/XavierGeerinck/PublicProjects/tree/master/JS/Dapr/AutoScalingHTTP

Note: For testing purposes, it’s easier to start-up the instance with dapr run --app-id gateway --app-port 4000 --components-path ../components npm run start:dev when running it in the AutoScalingHTTP/Gateway folder when running from the source code.

Once this is done, we can deploy it through the following Kubernetes YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: d-dapr-autoscaling-http-gateway
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dapr-autoscaling-http-gateway
  template:
    metadata:
      labels:
        app: dapr-autoscaling-http-gateway
      annotations:
        dapr.io/enabled: "true" # Do we inject a sidecar to this deployment?
        dapr.io/id: "dapr-autoscaling-http-gateway" # Unique ID or Name for Dapr App (so we can communicate with it)
        dapr.io/port: "5001" # Port we are going to listen on for Dapr interactions (is app specific)
    spec:
      containers:
      - name: main # Simple name so we can reach it easily
        image: thebillkidy/dapr-autoscaling-http-gateway:latest
        imagePullPolicy: Always
        ports:
          - containerPort: 5000 # This port we will expose (external port)
        env:
        - name: DAPR_HOST
          value: "127.0.0.1"
        - name: DAPR_PORT
          value: "3500"

and run kubectl apply -f deploy/k8s-gateway.yaml with kubectl logs -f deployment/d-dapr-autoscaling-http-gateway -c main to view the logs

Now this is deployed, we can expose the service so we can access it externally:

# Production clusters
kubectl expose deployment d-dapr-autoscaling-http-gateway --type=LoadBalancer --name=svc-dapr-autoscaling-http-gateway --port=80 --target-port=5000

# Development (minikube) cluster
# when running on dev with minikube, we need to expose the svc like this through kubectl port-forward which will open on a random port
kubectl port-forward --address 0.0.0.0 deployment/d-dapr-autoscaling-http-gateway :5000

Creating our Worker

The worker is a bit easier, in this case we will just listen to the incoming work through the input binding that we created. Once work comes in, we then do something (in this case a timeout) and then put it back on the queue! 😀

Kubernetes deployment YAML:

# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: d-dapr-autoscaling-http-worker
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dapr-autoscaling-http-worker
  template:
    metadata:
      labels:
        app: dapr-autoscaling-http-worker
      annotations:
        dapr.io/enabled: "true" # Do we inject a sidecar to this deployment?
        dapr.io/id: "dapr-autoscaling-http-worker" # Unique ID or Name for Dapr App (so we can communicate with it)
        dapr.io/port: "3000" # Port we are going to listen on for Dapr interactions (is app specific)
    spec:
      containers:
      - name: main # Simple name so we can reach it easily
        image: thebillkidy/dapr-autoscaling-http-worker:v0.0.1
        imagePullPolicy: Always
        ports:
          - containerPort: 3000
        env:
        - name: DAPR_HOST
          value: "127.0.0.1"
        - name: DAPR_PORT
          value: "3500"

and run kubectl apply -f deploy/k8s-worker.yaml

Demo Application

When we now go to the URL of the Gateway and we enter some query parameters (e.g. http://172.28.28.243:38203/?name=Xavier%20Geerinck&timeout=10000) we will see the following result:

demo 1

Showing what we wanted to achieve! A synchronous HTTP client that works over an Async Queue implementation for offloading the work to workers. But how do we now go and autoscale this? Well this is where KEDA comes in!

Setting up Autoscaling with KEDA

KEDA stands for “Kubernetes Event-Drive Autoscaling” which allows us to monitor certain resources and automatically scale up a deployment based on our needs. It’s super easy to set-up and is super powerful in horizontal scalable solutions. So let’s get started on configuring this for our set-up!

Installing KEDA

The first thing we should do is to install KEDA. We can simply do this by adding the repository and installing it to our created namespace:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update

# Install KEDA v2
kubectl create namespace keda
helm install keda kedacore/keda --version 2.0.0-rc --namespace keda

Configuring KEDA

Once Keda is up and running, we need to configure it to autoscale based on the metrics from our nginx controller. To describe this, we can utilize a YAML configuration that details the different aspects that we want to watch and act upon.

keda-rabbitmq-dapr-http-autoscale.yaml

# https://keda.sh/docs/2.0/scalers/rabbitmq-queue/
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: keda-dapr-autoscaling-http-worker
  namespace: default
spec:
  scaleTargetRef:
    name: d-dapr-autoscaling-http-worker
  triggers:
  - type: rabbitmq
    metadata:
      host: amqp://admin:admin@rabbitmq.default.svc.cluster.local:5672 # references a value of format amqp://guest:password@localhost:5672/vhost
      queueName: rabbitmq-worker-input
      queueLength: "2" # After how many do we scale up?

Once we created this, another kubectl apply -f deploy/keda-rabbitmq-dapr-http-autoscale.yaml will do the trick to configure the autoscaling!

Summary

In this article I showed you how you can utilise technologies such as Dapr and KEDA to make autoscaling workers a piece of cake on Kubernetes! In just a few simple steps we can achieve integrations and monitoring that would else take us a lot of infrastructure work.

I hope you enjoyed this and would love to see your opinions about it! Let me know how I can improve this article further!

Xavier Geerinck © 2020

Twitter - LinkedIn