Kubernetes is getting more popular everyday and it's no wonder why! When you are running applications on-premise or in-cloud, the possibility of having the applications in a portable way is a strong one! Removing the friction for scaling-out your application when you are ready for it, or even bursting scenarios.
For a use case I have been working on, I have the following requirements:
- 1 HTTP request = 1 Container
- Instance can be up to 1GB of memory
- No pub/sub, but synchronous HTTP request/response!
- Requests execution time can take up to 60 seconds
- I do not want to utilize a pool, but have an auto-scaling system in place
As in any project, there are multiple paths towards a suitable solution. After researching this a bit, I came across the following tools that looked promising:
- This felt like over-kill. It's quite complex to install (though there are tools such as
knctlbut it's outdated) and is quite heavy on the cluster it seems.
- This felt like over-kill. It's quite complex to install (though there are tools such as
- Super interesting auto scaler, but no HTTP based scaling supported (it's event driven) which I would need.
- Interesting! It however seems to keep images in memory (which would be 1GB in memory all the time for us) and auto-scaling is similar to Keda in this sense (it even supports KEDA)
Looking at the above, I was kind of disappointed since no solutions "seemed" to exist. But after a bit more of experimenting and exploration I came across a blog post by Anirudh Garg explaining how you can hook up an Nginx Ingress controller to Keda!
Now the above does solves my issue completely, seeing that I want to transform an async request into a synchronous request (i.e. I want to to get my HTTP request, auto scale the cluster and then return the response). Thus a custom component will have to be written in my case. The blog post however goes into how KEDA can be utilized, which seems to be an excellent fit for me!
As always, the most important when creating a new application is to at least draw out a high-level architecture of how the idea will look like. It might sound like a boring and unnecessary job, but I think it helps me a lot in doing a thought excercise that makes me think things through more before wasting time on trying things that won't work. Even if it doesn't work out correctly, you atleast tried 😉
Note: you can find the full source code utilised in this post here: https://github.com/XavierGeerinck/PublicProjects/tree/master/JS/Dapr/AutoScalingHTTP
Solution (Gateway and Worker)
As for technologies, I decided to settle on the following:
Before we get started, make sure the following is available:
- Kubernetes cluster with
kubectlmapped to it (Note: locally I just run
minikube start --cpus 2 --memory 2048and I'm ready to go)
Once Dapr has been installed, we want to set up our queue. As shown in the architecture, we will have an inbound application that puts items on a queue. As soon as these items have been published, a worker will take them and process what it has to do (in our case, returning "Hello World" after a small delay).
To install RabbitMQ, run the following commands:
# ref: https://github.com/bitnami/charts/tree/master/bitnami/rabbitmq/#parameters# Install repohelm repo add bitnami https://charts.bitnami.com/bitnamihelm repo update# Install RabbitMQhelm install rabbitmq --set auth.username=admin,auth.password=admin,volumePermissions.enabled=true bitnami/rabbitmq
Note: The default username is "user" and the default hostname is "rabbitmq.default.svc.cluster.local:5672" as dictated by the Kubernetes pod defaults of "SVC_NAME.NAMESPACE.cluster.local". The moniker then becomes
Important: When you created one before, make sure to delete the old PersistentVolumeClaim else the password won't work (
kubectl get pvc)
The first thing we want to do once we have a Kubernetes cluster ready is to deploy and install Dapr on it. We can do this by running the following command:
# Add helm repohelm repo add dapr https://dapr.github.io/helm-charts/helm repo update# Create namespace dapr-systemkubectl create namespace dapr-system# Install latest dapr versionhelm install dapr dapr/dapr --namespace dapr-system
Installing Dapr Bindings for RabbitMQ
Once Dapr and RabbitMQ have been installed, we can create bindings that will allow us to receive RabbitMQ events on a specific endpoint, as well as send events to RabbitMQ through calling a HTTP endpoint.
For this we create the following
.YAML files and apply them on our kubernetes cluster with
kubectl apply -f binding-rabbitmq-in.yaml and
kubectl apply -f binding-rabbitmq-out.yaml
apiVersion: dapr.io/v1alpha1kind: Componentmetadata:name: rabbitmq-worker-inputnamespace: defaultspec:type: bindings.rabbitmqmetadata:- name: hostvalue: amqp://user:[email protected]:5672- name: queueNamevalue: rabbitmq-worker-input
apiVersion: dapr.io/v1alpha1kind: Componentmetadata:name: rabbitmq-worker-outputnamespace: defaultspec:type: bindings.rabbitmqmetadata:- name: hostvalue: amqp://user:[email protected]:5672- name: queueNamevalue: rabbitmq-worker-output
Creating our Gateway
On to the hardest part! The gateway! If we take a look again at our architecture, we can see that 2 servers are required here. The reason being that we want to be able to divide external traffic (user request) from the internal Dapr traffic.
We thus create 2 files named
src/server-external.ts that will handle user traffic and internal traffic.
Note: For the full source code, feel free to check the repository https://github.com/XavierGeerinck/PublicProjects/tree/master/JS/Dapr/AutoScalingHTTP
Note: For testing purposes, it's easier to start-up the instance with
dapr run --app-id gateway --app-port 4000 --components-path ../components npm run start:devwhen running it in the
AutoScalingHTTP/Gatewayfolder when running from the source code.
Once this is done, we can deploy it through the following Kubernetes YAML:
apiVersion: apps/v1kind: Deploymentmetadata:name: d-dapr-autoscaling-http-gatewayspec:replicas: 1selector:matchLabels:app: dapr-autoscaling-http-gatewaytemplate:metadata:labels:app: dapr-autoscaling-http-gatewayannotations:dapr.io/enabled: "true" # Do we inject a sidecar to this deployment?dapr.io/id: "dapr-autoscaling-http-gateway" # Unique ID or Name for Dapr App (so we can communicate with it)dapr.io/port: "5001" # Port we are going to listen on for Dapr interactions (is app specific)spec:containers:- name: main # Simple name so we can reach it easilyimage: thebillkidy/dapr-autoscaling-http-gateway:latestimagePullPolicy: Alwaysports:- containerPort: 5000 # This port we will expose (external port)env:- name: DAPR_HOSTvalue: "127.0.0.1"- name: DAPR_PORTvalue: "3500"
kubectl apply -f deploy/k8s-gateway.yaml with
kubectl logs -f deployment/d-dapr-autoscaling-http-gateway -c main to view the logs
Now this is deployed, we can expose the service so we can access it externally:
# Production clusterskubectl expose deployment d-dapr-autoscaling-http-gateway --type=LoadBalancer --name=svc-dapr-autoscaling-http-gateway --port=80 --target-port=5000# Development (minikube) cluster# when running on dev with minikube, we need to expose the svc like this through kubectl port-forward which will open on a random portkubectl port-forward --address 0.0.0.0 deployment/d-dapr-autoscaling-http-gateway :5000
Creating our Worker
The worker is a bit easier, in this case we will just listen to the incoming work through the input binding that we created. Once work comes in, we then do something (in this case a timeout) and then put it back on the queue! 😀
Kubernetes deployment YAML:
# DeploymentapiVersion: apps/v1kind: Deploymentmetadata:name: d-dapr-autoscaling-http-workerspec:replicas: 1selector:matchLabels:app: dapr-autoscaling-http-workertemplate:metadata:labels:app: dapr-autoscaling-http-workerannotations:dapr.io/enabled: "true" # Do we inject a sidecar to this deployment?dapr.io/id: "dapr-autoscaling-http-worker" # Unique ID or Name for Dapr App (so we can communicate with it)dapr.io/port: "3000" # Port we are going to listen on for Dapr interactions (is app specific)spec:containers:- name: main # Simple name so we can reach it easilyimage: thebillkidy/dapr-autoscaling-http-worker:v0.0.1imagePullPolicy: Alwaysports:- containerPort: 3000env:- name: DAPR_HOSTvalue: "127.0.0.1"- name: DAPR_PORTvalue: "3500"
kubectl apply -f deploy/k8s-worker.yaml
When we now go to the URL of the Gateway and we enter some query parameters (e.g.
http://172.28.28.243:38203/?name=Xavier%20Geerinck&timeout=10000) we will see the following result:
Showing what we wanted to achieve! A synchronous HTTP client that works over an Async Queue implementation for offloading the work to workers. But how do we now go and autoscale this? Well this is where KEDA comes in!
Setting up Autoscaling with KEDA
KEDA stands for "Kubernetes Event-Drive Autoscaling" which allows us to monitor certain resources and automatically scale up a deployment based on our needs. It's super easy to set-up and is super powerful in horizontal scalable solutions. So let's get started on configuring this for our set-up!
The first thing we should do is to install KEDA. We can simply do this by adding the repository and installing it to our created namespace:
helm repo add kedacore https://kedacore.github.io/chartshelm repo update# Install KEDA v2kubectl create namespace kedahelm install keda kedacore/keda --version 2.0.0-rc --namespace keda
Once Keda is up and running, we need to configure it to autoscale based on the metrics from our nginx controller. To describe this, we can utilize a YAML configuration that details the different aspects that we want to watch and act upon.
# https://keda.sh/docs/2.0/scalers/rabbitmq-queue/apiVersion: keda.sh/v1alpha1kind: ScaledObjectmetadata:name: keda-dapr-autoscaling-http-workernamespace: defaultspec:scaleTargetRef:name: d-dapr-autoscaling-http-workertriggers:- type: rabbitmqmetadata:host: amqp://admin:[email protected]:5672 # references a value of format amqp://guest:[email protected]:5672/vhostqueueName: rabbitmq-worker-inputqueueLength: "2" # After how many do we scale up?
Once we created this, another
kubectl apply -f deploy/keda-rabbitmq-dapr-http-autoscale.yaml will do the trick to configure the autoscaling!
In this article I showed you how you can utilise technologies such as Dapr and KEDA to make autoscaling workers a piece of cake on Kubernetes! In just a few simple steps we can achieve integrations and monitoring that would else take us a lot of infrastructure work.
I hope you enjoyed this and would love to see your opinions about it! Let me know how I can improve this article further!