IoT Performance Load Testing with Locust and Azure [Continuous Delivery]

This article is the second part of IoT Performance Load Testing with Locust and Azure. I encourage you to read it before reading this article because it may contain some references to the previous one.

The previous article attempted to explain how to develop an MQTT performance load test using Locust.io and how to take profit from the distributed mode feature to run thousands of devices using Azure Container Instances or ACI.

The ACI implementation was the right approach to run the tests on-demand but was not the best approach to introduce the tests as a step of our continuous delivery pipeline. Why?

Not a scalable solution to run parallel tests. Limited CPU and Memory quota.
Managing the life cycle of the containers was not easy. Deleting the containers after the execution was a pain point.
Monitoring the container and test status was not as easy as we thought.

We end up having a lot of undesired manual steps to manage the containers.

GitHub Example: https://github.com/joan-mido-qa/mqtt-locust-on-azure

The repository contains an example. You may need to adapt the code and workflows to your requirements and configuration. It may not work as it is.

Locust on K8s

K8s solves the ACI scalability problem. AKS allows the creation of multiple node pools with dedicated resources depending on the requirement of each team. Each team can use independent node pools, and the number of nodes per pool depends on the required Locust load or the required number of parallel tests. Use the node pool auto-scaler to optimize the CPU and memory utilization. The user can configure a minimum amount of nodes available (expected nº of tests), and the auto-scaler will scale the pool size when needed (e.g., the test suddenly requires more resources, testing with loads higher than the expected, higher nº of expected parallel tests).

Developers can monitor the running tests using Grafana to display the statistics, Loki to aggregate each Pod log, and Prometheus to gather the Pod resources utilization

Helm Chart

Helm is a tool to develop, package, and install Kubernetes resources. A Helm Chart contains a set of Kubernetes object templates. The user can configure the installation of the objects using a values YAML file. It also has a client which implements a collection of useful commands to install and uninstall all the released resources.

Using a Helm Chart to run Locust on Kubernetes simplifies the setup and teardown of the tests, the configuration of each run, and the local debugging of the distributed mode. The developer can run Locust using their local Kind/Minikube or AKS cluster. Running Locust just requires one command:

helm install locust ./chart -f values.yaml

Then, uninstall and remove all the resources:

helm uninstall locust

GitHub Example: https://github.com/joan-mido-qa/mqtt-locust-on-azure/tree/main/charts/locust

Master Node

The Master node only requires a Pod containing the Locust container and a service to manage the workers.

The master node restart policy must be Never. K8s scheduler should not restart the master node once the test has finished.

Worker Node

The worker node uses an Indexed Job to parallel run all the expected workers:

apiVersion: batch/v1
kind: Job
metadata:
  name: workers
  labels:
    locust.io/node: worker
  annotations:
    "helm.sh/hook": post-install
spec:
  completions: 10
  parallelism: 10
  completionMode: Indexed
  ttlSecondsAfterFinished: 180

The Kubernetes Job will spawn the required number of workers and wait until the completion of all the workers. We take advantage of the Helm Hooks to use the Helm –wait option. Helm will block the execution until all the workers have finished.

Having a sleep waiting for the Locust run time was not reliable. The test run time is not equal to the test duration. The test duration includes the time it takes to scale the nodes, download the image, start the PODs, start and clean up the test, etc. The Helm –wait option contemplates all the listed times because it ends when the workers die.

Use the ttlSecondsAfterFinished property to delete the job after each execution. An alternative to deleting the workers right after the execution is to use the hook delete policy: “helm.sh/hook-delete-policy”: hook-succeeded, hook-failed. Using the Hooks is not the best approach to save the execution logs because the workers die right after the execution finishes (The GitHub workflow does not have enough time to download them).

Container Registry

the AKS cluster will need the registry credentials to download the private Locust images. The Helm Chart creates a secret that contains the required .dockerconfigjson.

{{- $registry := .Values.registry.server -}}
{{- $username := .Values.registry.user -}}
{{- $password := .Values.registry.pass -}}
---
apiVersion: v1
data:
  .dockerconfigjson: {{ printf "{\"auths\": {\"%s\": {\"auth\": \"%s\"}}}" $registry (printf "%s:%s" $username $password | b64enc) | b64enc }}
kind: Secret
metadata:
  name: container-registry
type: kubernetes.io/dockerconfigjson

To install the Helm Chart:

helm install locust ./chart --set registry.user="" --set registry.pass="" --set registry.server=""

Another alternative is to configure the Chart values.yaml to use an existing secret:

imagePullSecrets:
- name: container-registry

Locust on GitHub Actions

Teams can call the Locust reusable workflow to run the performance tests on demand. Each team can configure their run using the workflow inputs and secrets (e.g., load, environment, setup. etc.).

GitHub Example: https://github.com/joan-mido-qa/mqtt-locust-on-azure/blob/main/.github/workflows/run_azure_k8s.yaml

The workflow steps are the following:

Check out the Performance Code
Get the nº of workers (Users / Users per Worker)
Get the Run ID (String DateTime)
Azure Login with Azure CLI (Use a Service Principal or a Manage Identity)
Set AKS Cluster context
Install Locust Helm Chart
Pull the Master Node exit code
Upload the Master/Workers nodes log as artifacts
Uninstall Locust Helm Chart & delete the run namespace

To wait for the Locust tests to finish the workflow uses kubectl to poll for the container status until it has an exit status code equal to 0 (Success), 1 (Failed), or 2 (Unexpected Quit).

#!/bin/bash

x=0

while [ $x -le 600 ]
do
  echo "> Checking Status ..."

  if kubectl get namespace "${{ env.RELEASE_NAME }}" >/dev/null 2>&1; then
    echo "> Namespace "${{ env.RELEASE_NAME }}" exists, continue checking status"
  else
    echo "> Namespace "${{ env.RELEASE_NAME }}" does not exist"
    echo "> Locust Failed"
    exit 1
  fi

  exit_code=$(kubectl get pod master -n ${{ env.RELEASE_NAME }} -ojson | jq .status.containerStatuses[0].state.terminated.exitCode)

  if [ $exit_code = "null" ]; then
    echo "> Locust Running ..."

  elif [ $exit_code = 0 ]; then
    echo "> Locust Passed"

    exit 0

  elif [ $exit_code = 1 ]; then
    echo "> Locust Failed"

    exit 1

  elif [ $exit_code = 2 ]; then
    echo "> Locust Unexpected Quit"

    exit 1

  fi

  x=$(( $x + 5 ))

  echo "> Wait 5 seconds ..."

  sleep 5

done

echo "> Timeout"

exit 1

Alternatives

The Delivery Hero team has a Locust Helm Chart to run Locust on Kubernetes. Depending on your requirements it may be enough for your use case.

Another promising alternative is the Kubernetes Locust Operator. It’s still in development but it is well-documented and easy to use: https://abdelrhmanhamouda.github.io/locust-k8s-operator/. To use the Operetor, install the corresponding Kubernetes resources and then run it using the following object:

apiVersion: locust.io/v1 
kind: LocustTest 
metadata:
  name: demo.test 
spec:
  image: locustio/locust:latest 
  masterCommandSeed: 
    --locustfile /lotest/src/demo_test.py
    --host https://dummy.restapiexample.com
    --users 100
    --spawn-rate 3
    --run-time 3m
  workerCommandSeed: --locustfile /lotest/src/demo_test.py 
  workerReplicas: 3 
  configMap: demo-test-map

Finally, Testkube is another alternative. It does not have an implementation for Locust yet but it is possible to develop your custom implementation or use containers to run Locust.

Conclusions

The best solution is always the one that best fits your needs. There are many alternatives and approaches. We used an in-house implementation because we needed a customizable solution. Do not discard an already available market solution or a standardized one. They can speed up the development of a proof of concept, help you to understand your requirements better, and reduce maintenance and development costs. Gather the learnings from the PoC development, and apply them on your next iteration.