Checks
Controller Version
0.14.0
Deployment Method
ArgoCD
Checks
To Reproduce
Try to run 200-300 runners simultaneously.
Describe the bug
After upgrade to version 0.14.0/0.14.1 we faced issue with growing queue of runners which makes it impossible to run new jobs. Also during that time we were constantly hitting Reconcilliation limit.
Describe the expected behavior
After downgrading to version 0.13.1, the queue has dropped significantly, and we are not hitting Reconcilliation limit.
Additional Context
Parameters that we use:
replicaCount: 3
flags:
logLevel: "debug"
logFormat: "json"
runnerMaxConcurrentReconciles: 10
k8sClientRateLimiterQPS: 50
k8sClientRateLimiterBurst: 100
updateStrategy: "immediate"
To fix that we tried to bump QPS to 50, Burst to 100, but it didn't help.
We also tried to bump Reconcilliation limit to 25 and to 50. It helped to reduce queue for some time, but then we started hitting Reconcilliation limit again.
Controller Logs
https://gist.github.com/adilkhan-kushumbayev/a5cd8477cff64cf69249fb1d80e2dedd
Runner Pod Logs
https://gist.github.com/adilkhan-kushumbayev/a5cd8477cff64cf69249fb1d80e2dedd
Checks
Controller Version
0.14.0
Deployment Method
ArgoCD
Checks
To Reproduce
Describe the bug
After upgrade to version 0.14.0/0.14.1 we faced issue with growing queue of runners which makes it impossible to run new jobs. Also during that time we were constantly hitting Reconcilliation limit.
Describe the expected behavior
After downgrading to version 0.13.1, the queue has dropped significantly, and we are not hitting Reconcilliation limit.
Additional Context
Controller Logs
Runner Pod Logs