Setting up autoscaling on Kubernetes
With Kubernetes' Horizontal Pod Autoscaling (HPA), you can automatically scale your SonarQube Server out and in, resolving any performance issues you may have.
The HPA increases or decreases the number of deployment replicas according to the overhaul CPU consumption of the SonarQube Server Pods.
Warning before you start
Currently, autoscaling targets only the Data Center Edition application nodes. The initial goal is to improve the pull request analysis time by ensuring that background tasks do not pile up.
This feature should be used with caution, as it can significantly increase costs. This is the first iteration, and future improvements will come.
We suggest disabling autoscaling for long-running upgrades to prevent unnecessary scaling due to this initial upgrade load.
Requirements
Make sure the metric server is installed in your Kubernetes cluster.
Autoscaling can function optimally only if the system does not have a bottleneck. You should monitor your system to avoid bottlenecks (see Troubleshooting autoscaling below).
Enabling autoscaling
To enable autoscaling in your DCE cluster:
- In the
values.yaml
file of the SonarQube Server's DCE Helm chart, set theApplicationNodes.hpa.enabled
totrue
.
Testing autoscaling
To test autoscaling:
- Check the
pending_count
andpending_time
of background tasks (see Instance monitoring). If they are increasing, SonarQube Server should scale up. If not (autoscaling is not triggered), perform the steps described below.
If the autoscaling test is negative, see Troubleshooting autoscaling below.
Troubleshooting autoscaling
Autoscaling can function optimally only if the system does not have a bottleneck. You should monitor your system to avoid bottlenecks.
If autoscaling is not triggered properly:
- Change the number of workers per node that will process background tasks. For the DCE Helm chart's default request/limit resources (see Default configuration in Autoscaling configuration below), set this number to 3 ( 3 is the ideal number for maximizing performance and inducing a constant load to trigger autoscaling).
- Check that your database is not under heavy load. This can be because the database's CPU/RAM/IO are capped at the maximum value. Some databases also have an IO burst balance that can get exhausted (Database I/Os are very important for optimal performances.)
- Perform the same checks regarding networking and resource cap on the reverse proxy, load balancer, network, and Kubernetes nodes I/Os.
- If autoscaling still does not work properly, try to adjust the configuration with caution. See the Autoscaling configuration section below for details.
Disabling autoscaling
- In the
values.yaml
file of the SonarQube Server’s DCE Helm chart, set theApplicationNodes.hpa.enabled
tofalse
.
Autoscaling configuration
Default configuration
The default autoscaling configuration in the SonarQube Server's DCE Helm chart is shown below. Note that it's designed to work with the default resources (see below the configuration).
Default resources
The default autoscaling setup is designed to work with the helm chart's default resources block shown below.
Minimum number of deployment replicas
We highly recommend not setting minReplicas
below 2, but you can adjust according to your availability needs.
Maximum number of deployment replicas
maxReplicas
can be freely edited, but remember that this can induce a huge increase in costs.
Scale-up policy
The scale-up policy (scaleUp:policies
) defines the extent to which the number of Pods increases (value
) during a given period of time (periodSeconds
) in case 80% of the CPU request is reached.
The default scale-up policy aims at best-effort efficiency over cost by at most doubling the number of Pods (value
= 100%) every 10 minutes. We are aggressively scaling up to compensate for SonarQube Server long startup time (about a minute) and to let the stabilization happen after startup:
- The 10-minute period is important as it lets the new Pod stabilize its CPU consumption at startup, preventing an autoscaling loop in which the Pods are scaled up to the maximum number directly.
- Doubling allows for an exponential scale-up that can catch up with the load and ensure a 10-minute lag at most.
Scale-down policy
The scale-down policy (scaleDown:policies
) defines the extent to which the number of Pods decreases (value
) during a given period of time (periodSeconds
) in case the CPU max value of the past 60 seconds is below the 80% CPU request.
The default scale-down policy removes 1 Pod every 20 seconds. It suits the aggressive default scale-up policy by scaling down quickly to the required number of Pods to accommodate the load.
Was this page helpful?