Skip to content

0.14.0/0.14.1 cannot handle big runners queue #4460

@adilkhan-kushumbayev

Description

@adilkhan-kushumbayev

Checks

Controller Version

0.14.0

Deployment Method

ArgoCD

Checks

  • This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

Try to run 200-300 runners simultaneously.

Describe the bug

After upgrade to version 0.14.0/0.14.1 we faced issue with growing queue of runners which makes it impossible to run new jobs. Also during that time we were constantly hitting Reconcilliation limit.

Image

Describe the expected behavior

After downgrading to version 0.13.1, the queue has dropped significantly, and we are not hitting Reconcilliation limit.

Image

Additional Context

Parameters that we use:

replicaCount: 3

flags:
  logLevel: "debug"
  logFormat: "json"
  runnerMaxConcurrentReconciles: 10
  k8sClientRateLimiterQPS: 50
  k8sClientRateLimiterBurst: 100
  updateStrategy: "immediate"


To fix that we tried to bump QPS to 50, Burst to 100, but it didn't help.
We also tried to bump Reconcilliation limit to 25 and to 50. It helped to reduce queue for some time, but then we started hitting Reconcilliation limit again.

Controller Logs

https://gist.github.com/adilkhan-kushumbayev/a5cd8477cff64cf69249fb1d80e2dedd

Runner Pod Logs

https://gist.github.com/adilkhan-kushumbayev/a5cd8477cff64cf69249fb1d80e2dedd

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinggha-runner-scale-setRelated to the gha-runner-scale-set modeneeds triageRequires review from the maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions