Kueue 队列在 Cohort 内的资源借用与收回
准备工作
创建 ResourceFlavor, Cohort 以及两个队列:
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
name: default
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: Cohort
metadata:
name: "test"
spec:
resourceGroups:
- coveredResources: ["cpu"]
flavors:
- name: default
resources:
- name: cpu
nominalQuota: 2
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: test1
spec:
namespaceSelector: {} # match all.
cohortName: test
preemption:
reclaimWithinCohort: Any
borrowWithinCohort:
policy: Never
withinClusterQueue: Never
resourceGroups:
- coveredResources: ["cpu"]
flavors:
- name: default
resources:
- name: cpu
nominalQuota: 1
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: test2
spec:
namespaceSelector: {} # match all.
cohortName: test
preemption:
reclaimWithinCohort: Any
borrowWithinCohort:
policy: Never
withinClusterQueue: Never
resourceGroups:
- coveredResources: ["cpu"]
flavors:
- name: default
resources:
- name: cpu
nominalQuota: 1
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
namespace: default
name: test1
spec:
clusterQueue: test1
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
namespace: default
name: test2
spec:
clusterQueue: test2
Cohort 本身定义的额度是内部所有队列之外的,因此 Cohort 内的额度分配如下图:
pie showData title Cohort 额度分配
"队列 test1": 1
"队列 test2": 1
"额外": 2
实验一、额度借用与收回
创建作业 test1:
apiVersion: batch/v1
kind: Job
metadata:
name: test1
labels:
kueue.x-k8s.io/queue-name: test1
spec:
completions: 4
completionMode: Indexed
parallelism: 4
template:
spec:
restartPolicy: OnFailure
containers:
- image: busybox
imagePullPolicy: IfNotPresent
name: busybox-sleep
command: ["sh", "-c", "trap exit INT TERM; sleep 30s & wait"]
resources:
requests:
cpu: "1"
limits:
cpu: "1"
可见作业 test1 使用了整个 Cohort 包括队列 test2 的额度。
然后创建作业 test2:
apiVersion: batch/v1
kind: Job
metadata:
name: test2
labels:
kueue.x-k8s.io/queue-name: test2
spec:
completions: 1
completionMode: Indexed
parallelism: 1
template:
spec:
restartPolicy: OnFailure
containers:
- image: busybox
imagePullPolicy: IfNotPresent
name: busybox-sleep
command: ["sh", "-c", "trap exit INT TERM; sleep 30s & wait"]
resources:
requests:
cpu: "1"
limits:
cpu: "1"
监视作业状态变化:
$ kubectl get job -w
NAME STATUS COMPLETIONS DURATION AGE
test1 Running 0/4 0s
test1 Suspended 0/4 0s
test1 Suspended 0/4 0s
test1 Running 0/4 0s 0s
test1 Running 0/4 2s 2s
test2 Running 0/1 0s
test2 Suspended 0/1 0s
test1 Running 0/4 9s 9s
test1 Running 0/4 9s
test1 Running 0/4 9s
test1 Suspended 0/4 9s
test2 Suspended 0/1 0s
test2 Running 0/1 0s 0s
test1 Suspended 0/4 10s
test2 Running 0/1 2s 2s
test2 Running 0/1 33s 33s
test2 SuccessCriteriaMet 1/1 34s 34s
test2 Complete 1/1 34s 34s
test1 Suspended 0/4 43s
test1 Running 0/4 0s 43s
test1 Running 0/4 2s 45s
test1 Running 0/4 32s 75s
test1 SuccessCriteriaMet 4/4 33s 76s
test1 Complete 4/4 33s 76s
监视 Workload 状态变化:
$ kubectl get wl -o wide -w
NAME QUEUE RESERVED IN ADMITTED FINISHED AGE
job-test1-f14e1 test1 0s
job-test1-f14e1 test1 0s
job-test1-f14e1 test1 test1 True 0s
job-test1-f14e1 test1 test1 True 2s
job-test2-f2017 test2 0s
job-test1-f14e1 test1 test1 True 9s
job-test2-f2017 test2 0s
job-test2-f2017 test2 0s
job-test1-f14e1 test1 test1 True 9s
job-test1-f14e1 test1 False 9s
job-test1-f14e1 test1 False 9s
job-test2-f2017 test2 test2 True 0s
job-test1-f14e1 test1 False 9s
job-test1-f14e1 test1 False 9s
job-test2-f2017 test2 test2 True 2s
job-test2-f2017 test2 test2 True 33s
job-test2-f2017 test2 test2 True 34s
job-test2-f2017 test2 False True 34s
job-test2-f2017 test2 False True 34s
job-test1-f14e1 test1 test1 True 43s
job-test1-f14e1 test1 test1 True 45s
job-test1-f14e1 test1 test1 True 75s
job-test1-f14e1 test1 test1 True 76s
job-test1-f14e1 test1 False True 76s
job-test1-f14e1 test1 False True 76s
可以看到作业 test2 提交后,作业 test1 被迫退出。
实验二、只能收回本队列额度
现在将作业 test2 的并行度改为 2, 超过队列 test2 的额度:
labels:
kueue.x-k8s.io/queue-name: test2
spec:
- completions: 1
+ completions: 2
completionMode: Indexed
- parallelism: 1
+ parallelism: 2
template:
spec:
restartPolicy: OnFailure
与实验二步骤相同。监视作业状态变化:
$ kubectl get job -w
NAME STATUS COMPLETIONS DURATION AGE
test1 Running 0/4 0s
test1 Suspended 0/4 0s
test1 Suspended 0/4 0s
test1 Running 0/4 0s 0s
test1 Running 0/4 2s 2s
test2 Running 0/2 0s
test2 Suspended 0/2 0s
test1 Running 0/4 32s 32s
test1 SuccessCriteriaMet 4/4 33s 33s
test1 Complete 4/4 33s 33s
test2 Suspended 0/2 26s
test2 Running 0/2 0s 26s
test2 Running 0/2 2s 28s
test2 Running 0/2 32s 58s
test2 SuccessCriteriaMet 2/2 33s 59s
test2 Complete 2/2 33s 59s
监视 Workload 状态变化:
$ kubectl get wl -o wide -w
NAME QUEUE RESERVED IN ADMITTED FINISHED AGE
job-test1-647d2 test1 0s
job-test1-647d2 test1 0s
job-test1-647d2 test1 test1 True 0s
job-test1-647d2 test1 test1 True 2s
job-test2-05e1b test2 0s
job-test2-05e1b test2 0s
job-test2-05e1b test2 0s
job-test1-647d2 test1 test1 True 32s
job-test1-647d2 test1 test1 True 33s
job-test2-05e1b test2 26s
job-test1-647d2 test1 test1 True 33s
job-test2-05e1b test2 test2 True 26s
job-test1-647d2 test1 False True 33s
job-test1-647d2 test1 False True 33s
job-test2-05e1b test2 test2 True 28s
job-test2-05e1b test2 test2 True 58s
job-test2-05e1b test2 test2 True 59s
job-test2-05e1b test2 False True 59s
job-test2-05e1b test2 False True 59s
可以看到作业 test2 因为额度不够被挂起了。等作业 test1 完成以后,test2 又可以借用 Cohort 内的资源运行。
以上结果是因为关闭了 borrowWithinCohort 功能。如果打开的话,高优先级的作业可以挤掉其他队列里低优先级的作业。