10.7 | Server installation and setup | Deploying on Kubernetes | Setting up autoscaling

Setting up autoscaling on Kubernetes

Warning before you start

Currently, autoscaling targets only the Data Center Edition application nodes. The initial goal is to improve the pull request analysis time by ensuring that background tasks do not pile up.

This feature should be used with caution, as it can significantly increase costs. This is the first iteration, and future improvements will come.

We suggest disabling autoscaling for long-running upgrades to prevent unnecessary scaling due to this initial upgrade load.

Requirements

Make sure the metric server is installed in your Kubernetes cluster.

Autoscaling can function optimally only if the system does not have a bottleneck. You should monitor your system to avoid bottlenecks (see Troubleshooting autoscaling below).

Enabling autoscaling

To enable autoscaling in your DCE cluster:

In the values.yaml file of the SonarQube’s DCE Helm chart, set the ApplicationNodes.hpa.enabled to true.

Testing autoscaling

To test autoscaling:

Check the pending_count and pending_time of background tasks (see Instance monitoring). If they are increasing, SonarQube should scale up. If not (autoscaling is not triggered), perform the steps described below.

If the autoscaling test is negative, see Troubleshooting autoscaling below.

Troubleshooting autoscaling

Autoscaling can function optimally only if the system does not have a bottleneck. You should monitor your system to avoid bottlenecks.

If autoscaling is not triggered properly:

Change the number of workers per node that will process background tasks. For the DCE Helm chart's default request/limit resources (see Default configuration in Autoscaling configuration below), set this number to 3 ( 3 is the ideal number for maximizing performance and inducing a constant load to trigger autoscaling).
Check that your database is not under heavy load. This can be because the database's CPU/RAM/IO are capped at the maximum value. Some databases also have an IO burst balance that can get exhausted (Database I/Os are very important for optimal performances.)
Perform the same checks regarding networking and resource cap on the reverse proxy, load balancer, network, and Kubernetes nodes I/Os.
If autoscaling still does not work properly, try to adjust the configuration with caution. See the Autoscaling configuration section below for details.

Disabling autoscaling

In the values.yaml file of the SonarQube’s DCE Helm chart, set the ApplicationNodes.hpa.enabled to false.

Autoscaling configuration

Default configuration

The default autoscaling configuration in the SonarQube's DCE Helm chart is shown below. Note that it's designed to work with the default resources (see below the configuration).

 hpa:

    enabled: false

    minReplicas: 2

    maxReplicas: 10

    metrics:

      - type: Resource

        resource:

          name: cpu

          target:

            type: Utilization

            averageUtilization: 80

    behavior:

      scaleDown:

        stabilizationWindowSeconds: 60

        policies:

          - type: Pods

            value: 1

            periodSeconds: 20

      scaleUp:

        stabilizationWindowSeconds: 0

        policies:

          - type: Percent

            value: 100

            periodSeconds: 600

Default resources

The default autoscaling setup is designed to work with the helm chart's default resources block shown below.

  resources:

    limits:

      cpu: 800m

      memory: 3072M

    requests:

      cpu: 400m

      memory: 3072M

Minimum number of deployment replicas

We highly recommend not setting minReplicas below 2, but you can adjust according to your availability needs.

Maximum number of deployment replicas

maxReplicas can be freely edited, but remember that this can induce a huge increase in costs.

Scale-up policy

The scale-up policy (scaleUp:policies) defines the extent to which the number of Pods increases (value) during a given period of time (periodSeconds) in case 80% of the CPU request is reached.

The default scale-up policy aims at best-effort efficiency over cost by at most doubling the number of Pods (value = 100%) every 10 minutes. We are aggressively scaling up to compensate for SonarQube long startup time (about a minute) and to let the stabilization happen after startup:

The 10-minute period is important as it lets the new Pod stabilize its CPU consumption at startup, preventing an autoscaling loop in which the Pods are scaled up to the maximum number directly.
Doubling allows for an exponential scale-up that can catch up with the load and ensure a 10-minute lag at most.

Scale-down policy

The scale-down policy (scaleDown:policies) defines the extent to which the number of Pods decreases (value) during a given period of time (periodSeconds) in case the CPU max value of the past 60 seconds is below the 80% CPU request.

The default scale-down policy removes 1 Pod every 20 seconds. It suits the aggressive default scale-up policy by scaling down quickly to the required number of Pods to accommodate the load.

Was this page helpful?