Volcano 分层队列实验
修改 Volcano 调度器配置,使能 capacity 插件:
$ kubectl edit cm -n volcano-system volcano-scheduler-configmap
configmap/volcano-scheduler-configmap edited
修改内容如下:
apiVersion: v1
data:
volcano-scheduler.conf: |
- actions: "enqueue, allocate, backfill"
+ actions: "enqueue, allocate, backfill, preempt, reclaim"
tiers:
- plugins:
- name: priority
+ - name: overcommit
- name: gang
enablePreemptable: false
- name: conformance
- plugins:
- - name: overcommit
- name: drf
enablePreemptable: false
- name: predicates
- - name: proportion
+ - name: capacity
+ enableHierarchy: true
- name: nodeorder
- name: binpack
kind: ConfigMap
Note
capacity 插件和 proportion 插件互斥,不能同时使用。
编辑文件 hierarchical_q.yaml, 内容如下:
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: a
spec:
reclaimable: true
capability:
cpu: 4
deserved:
cpu: 4
guarantee:
resource:
cpu: 4
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: a1
spec:
reclaimable: true
parent: a
deserved:
cpu: 2
---
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
name: a2
spec:
reclaimable: true
parent: a
deserved:
cpu: 2
应用到集群:
$ kubectl apply -f hierarchical_q.yaml
queue.scheduling.volcano.sh/a created
queue.scheduling.volcano.sh/a1 created
queue.scheduling.volcano.sh/a2 created
查看队列:
$ kubectl get q
NAME PARENT
a root
a1 a
a2 a
default root
root
实险一:队列资源限制和借用
创建一个 vcjob, 配置如下:
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: a1
spec:
minAvailable: 1
schedulerName: volcano
queue: a1
policies:
- event: PodEvicted
action: RestartJob
tasks:
- replicas: 8
name: sleep-task
policies:
- event: TaskCompleted
action: CompleteJob
template:
spec:
restartPolicy: Never
containers:
- image: busybox
imagePullPolicy: IfNotPresent
name: busybox-sleep
command: ["sh", "-c", "trap exit INT TERM; sleep 1m & wait"]
resources:
requests:
cpu: 1
limits:
cpu: 1
这个作业将提交到队列 a1. 提交到集群之后,观察其状态:
$ kubectl get vj -owide -w
NAME STATUS MINAVAILABLE RUNNINGS AGE QUEUE
a1 0s a1
a1 Pending 1 0s a1
a1 Pending 1 0s a1
a1 Running 1 1 2s a1
a1 Running 1 2 3s a1
a1 Running 1 3 3s a1
a1 Running 1 4 3s a1
a1 Running 1 3 64s a1
a1 Running 1 2 64s a1
a1 Running 1 1 64s a1
a1 Running 1 65s a1
a1 Running 1 1 65s a1
a1 Running 1 2 65s a1
a1 Running 1 3 66s a1
a1 Running 1 4 66s a1
a1 Running 1 3 2m7s a1
a1 Running 1 2 2m7s a1
a1 Running 1 1 2m7s a1
a1 Completing 1 2m8s a1
a1 Completed 1 2m8s a1
a1 Completed 1 2m8s a1
可见虽然队列 a1 没有限制 capability, 但是作业受其父队列 a 的限制,CPU 个数不能超过 4. 但由于作业设置的最小任务数为 1, 只需要 1 个 CPU 即可满足,所以被调度,并且利用了全部可用的资源。此时队列 a2 的资源实际上被“借用”了。
监视 Pod 的状态可以看到更多的细节:
$ kubectl get po -owide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
a1-sleep-task-7 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-2 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-0 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-4 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-5 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-3 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-1 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-1 0/1 Pending 0 1s <none> las1 <none> <none>
a1-sleep-task-2 0/1 Pending 0 1s <none> las1 <none> <none>
a1-sleep-task-3 0/1 Pending 0 1s <none> las1 <none> <none>
a1-sleep-task-0 0/1 Pending 0 1s <none> las1 <none> <none>
a1-sleep-task-4 0/1 Pending 0 1s <none> <none> <none> <none>
a1-sleep-task-7 0/1 Pending 0 1s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Pending 0 1s <none> <none> <none> <none>
a1-sleep-task-5 0/1 Pending 0 1s <none> <none> <none> <none>
a1-sleep-task-1 0/1 ContainerCreating 0 1s <none> las1 <none> <none>
a1-sleep-task-2 0/1 ContainerCreating 0 1s <none> las1 <none> <none>
a1-sleep-task-0 0/1 ContainerCreating 0 1s <none> las1 <none> <none>
a1-sleep-task-3 0/1 ContainerCreating 0 1s <none> las1 <none> <none>
a1-sleep-task-3 0/1 ContainerCreating 0 2s <none> las1 <none> <none>
a1-sleep-task-2 0/1 ContainerCreating 0 2s <none> las1 <none> <none>
a1-sleep-task-1 0/1 ContainerCreating 0 2s <none> las1 <none> <none>
a1-sleep-task-0 0/1 ContainerCreating 0 2s <none> las1 <none> <none>
a1-sleep-task-3 1/1 Running 0 2s 192.168.221.181 las1 <none> <none>
a1-sleep-task-5 0/1 Pending 0 2s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Pending 0 2s <none> <none> <none> <none>
a1-sleep-task-4 0/1 Pending 0 2s <none> <none> <none> <none>
a1-sleep-task-7 0/1 Pending 0 2s <none> <none> <none> <none>
a1-sleep-task-0 1/1 Running 0 3s 192.168.221.187 las1 <none> <none>
a1-sleep-task-1 1/1 Running 0 3s 192.168.221.185 las1 <none> <none>
a1-sleep-task-2 1/1 Running 0 3s 192.168.221.191 las1 <none> <none>
a1-sleep-task-4 0/1 Pending 0 3s <none> <none> <none> <none>
a1-sleep-task-7 0/1 Pending 0 3s <none> <none> <none> <none>
a1-sleep-task-5 0/1 Pending 0 3s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Pending 0 3s <none> <none> <none> <none>
a1-sleep-task-1 0/1 Completed 0 62s 192.168.221.185 las1 <none> <none>
a1-sleep-task-3 0/1 Completed 0 62s 192.168.221.181 las1 <none> <none>
a1-sleep-task-2 0/1 Completed 0 62s 192.168.221.191 las1 <none> <none>
a1-sleep-task-0 0/1 Completed 0 63s 192.168.221.187 las1 <none> <none>
⋮
这里可以观察到:
虽然资源只允许 4 个任务被调度,但实际上所有任务对应的 Pod 从一开始就都被创建出来了
被调度的 4 个任务分配在一个节点上
实验二:资源回收
再创建另一个 vcjob, 配置与之前相同,只是将队列改为 a2:
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
- name: a1
+ name: a2
spec:
minAvailable: 1
schedulerName: volcano
- queue: a1
+ queue: a2
policies:
- event: PodEvicted
action: RestartJob
删除刚才的作业 a1. 然后重新提交 a1, 候其 4 个任务运行后提交 a2, 观察作业状态变化:
$ kubectl get vj -owide -w
NAME STATUS MINAVAILABLE RUNNINGS AGE QUEUE
a1 0s a1
a1 Pending 1 0s a1
a1 Pending 1 1s a1
a1 Running 1 1 4s a1
a1 Running 1 2 4s a1
a1 Running 1 3 4s a1
a1 Running 1 4 4s a1
a2 0s a2
a2 Pending 1 0s a2
a2 Pending 1 1s a2
a1 Restarting 1 15s a1
a1 Pending 1 15s a1
a1 Pending 1 17s a1
a1 Pending 1 17s a1
a1 Pending 1 17s a1
a1 Pending 1 17s a1
a2 Running 1 1 5s a2
a1 Running 1 1 18s a1
a2 Running 1 2 5s a2
a1 Running 1 2 18s a1
a2 Running 1 1 66s a2
a2 Running 1 66s a2
a1 Running 1 1 79s a1
a1 Running 1 79s a1
a1 Running 1 1 81s a1
a2 Running 1 1 68s a2
a1 Running 1 2 81s a1
a2 Running 1 2 68s a2
a1 Running 1 1 2m22s a1
a1 Running 1 2m23s a1
a2 Running 1 1 2m10s a2
a2 Running 1 2m10s a2
a2 Running 1 1 2m11s a2
a1 Running 1 1 2m24s a1
a1 Running 1 2 2m24s a1
a2 Running 1 2 2m12s a2
a2 Running 1 1 3m13s a2
a1 Running 1 1 3m26s a1
a2 Running 1 3m13s a2
a1 Running 1 3m26s a1
a2 Running 1 1 3m14s a2
a2 Running 1 2 3m14s a2
a1 Running 1 1 3m28s a1
a1 Running 1 2 3m28s a1
a1 Running 1 1 4m29s a1
a1 Completing 1 4m29s a1
a1 Completed 1 4m29s a1
a1 Completed 1 4m29s a1
a2 Running 1 1 4m16s a2
a2 Completing 1 4m16s a2
a2 Completed 1 4m16s a2
a2 Completed 1 4m16s a2
可以看到,当作业 a2 提交时,作业 a1 借用的资源需要收回,因此被重启。两个作业以各自队列 deserved 资源同时运行。观察 Pod 的状态变化可更清楚地发现:作业重启是将所有的 Pod 全部终结再重新入队:
$ kubectl get po -owide -w
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
⋮
a1-sleep-task-3 1/1 Running 0 3s 192.168.221.176 las1 <none> <none>
a1-sleep-task-1 1/1 Running 0 3s 192.168.221.138 las1 <none> <none>
a1-sleep-task-0 1/1 Running 0 3s 192.168.221.135 las1 <none> <none>
a1-sleep-task-2 1/1 Running 0 3s 192.168.221.141 las1 <none> <none>
a1-sleep-task-6 0/1 Pending 0 3s <none> <none> <none> <none>
a1-sleep-task-4 0/1 Pending 0 3s <none> <none> <none> <none>
a1-sleep-task-5 0/1 Pending 0 3s <none> <none> <none> <none>
a1-sleep-task-7 0/1 Pending 0 3s <none> <none> <none> <none>
a2-sleep-task-6 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-1 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-5 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-3 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-2 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-7 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-0 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-4 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-3 1/1 Running 0 14s 192.168.221.176 las1 <none> <none>
a1-sleep-task-3 1/1 Terminating 0 14s 192.168.221.176 las1 <none> <none>
a1-sleep-task-5 0/1 Pending 0 14s <none> <none> <none> <none>
a1-sleep-task-7 0/1 Pending 0 14s <none> <none> <none> <none>
a1-sleep-task-4 0/1 Pending 0 14s <none> <none> <none> <none>
a2-sleep-task-6 0/1 Pending 0 1s <none> <none> <none> <none>
a2-sleep-task-2 0/1 Pending 0 1s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Pending 0 14s <none> <none> <none> <none>
a2-sleep-task-5 0/1 Pending 0 1s <none> <none> <none> <none>
a2-sleep-task-1 0/1 Pending 0 1s <none> <none> <none> <none>
a2-sleep-task-3 0/1 Pending 0 1s <none> <none> <none> <none>
a2-sleep-task-4 0/1 Pending 0 1s <none> <none> <none> <none>
a2-sleep-task-7 0/1 Pending 0 1s <none> <none> <none> <none>
a1-sleep-task-1 1/1 Terminating 0 14s 192.168.221.138 las1 <none> <none>
a2-sleep-task-0 0/1 Pending 0 1s <none> <none> <none> <none>
a1-sleep-task-3 1/1 Terminating 0 14s 192.168.221.176 las1 <none> <none>
a1-sleep-task-2 1/1 Terminating 0 14s 192.168.221.141 las1 <none> <none>
a1-sleep-task-7 0/1 Terminating 0 14s <none> <none> <none> <none>
a1-sleep-task-7 0/1 Terminating 0 14s <none> <none> <none> <none>
a1-sleep-task-5 0/1 Terminating 0 14s <none> <none> <none> <none>
a1-sleep-task-5 0/1 Terminating 0 14s <none> <none> <none> <none>
a1-sleep-task-0 1/1 Terminating 0 14s 192.168.221.135 las1 <none> <none>
a1-sleep-task-6 0/1 Terminating 0 14s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Terminating 0 14s <none> <none> <none> <none>
a1-sleep-task-4 0/1 Terminating 0 14s <none> <none> <none> <none>
a1-sleep-task-4 0/1 Terminating 0 14s <none> <none> <none> <none>
a1-sleep-task-7 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-5 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-4 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-3 1/1 Terminating 0 15s 192.168.221.176 las1 <none> <none>
a1-sleep-task-2 1/1 Terminating 0 15s 192.168.221.141 las1 <none> <none>
a1-sleep-task-1 1/1 Terminating 0 15s 192.168.221.138 las1 <none> <none>
a1-sleep-task-0 1/1 Terminating 0 15s 192.168.221.135 las1 <none> <none>
a1-sleep-task-3 0/1 Error 0 15s 192.168.221.176 las1 <none> <none>
a1-sleep-task-1 0/1 Error 0 15s 192.168.221.138 las1 <none> <none>
a1-sleep-task-2 0/1 Error 0 15s 192.168.221.141 las1 <none> <none>
a1-sleep-task-0 0/1 Error 0 15s 192.168.221.135 las1 <none> <none>
a1-sleep-task-7 0/1 Pending 0 1s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Pending 0 1s <none> <none> <none> <none>
a2-sleep-task-7 0/1 Pending 0 2s <none> <none> <none> <none>
a2-sleep-task-6 0/1 Pending 0 2s <none> <none> <none> <none>
a2-sleep-task-5 0/1 Pending 0 2s <none> <none> <none> <none>
a2-sleep-task-4 0/1 Pending 0 2s <none> <none> <none> <none>
a2-sleep-task-3 0/1 Pending 0 2s <none> <none> <none> <none>
a2-sleep-task-2 0/1 Pending 0 2s <none> <none> <none> <none>
a2-sleep-task-1 0/1 Pending 0 2s <none> las1 <none> <none>
a1-sleep-task-4 0/1 Pending 0 1s <none> las1 <none> <none>
a1-sleep-task-5 0/1 Pending 0 1s <none> las1 <none> <none>
a2-sleep-task-0 0/1 Pending 0 2s <none> las1 <none> <none>
a2-sleep-task-1 0/1 ContainerCreating 0 2s <none> las1 <none> <none>
a1-sleep-task-4 0/1 ContainerCreating 0 1s <none> las1 <none> <none>
a2-sleep-task-0 0/1 ContainerCreating 0 2s <none> las1 <none> <none>
a1-sleep-task-5 0/1 ContainerCreating 0 1s <none> las1 <none> <none>
a1-sleep-task-1 0/1 Error 0 16s 192.168.221.138 las1 <none> <none>
a1-sleep-task-1 0/1 Error 0 16s 192.168.221.138 las1 <none> <none>
a1-sleep-task-1 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-0 0/1 Error 0 16s 192.168.221.135 las1 <none> <none>
a1-sleep-task-0 0/1 Error 0 16s 192.168.221.135 las1 <none> <none>
a1-sleep-task-0 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-2 0/1 Error 0 16s 192.168.221.141 las1 <none> <none>
a1-sleep-task-2 0/1 Error 0 16s 192.168.221.141 las1 <none> <none>
a1-sleep-task-2 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-3 0/1 Error 0 16s 192.168.221.176 las1 <none> <none>
a1-sleep-task-3 0/1 Error 0 16s 192.168.221.176 las1 <none> <none>
a1-sleep-task-3 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-1 0/1 ContainerCreating 0 3s <none> las1 <none> <none>
a1-sleep-task-4 0/1 ContainerCreating 0 2s <none> las1 <none> <none>
a1-sleep-task-5 0/1 ContainerCreating 0 2s <none> las1 <none> <none>
a2-sleep-task-0 0/1 ContainerCreating 0 3s <none> las1 <none> <none>
a2-sleep-task-2 0/1 Pending 0 3s <none> <none> <none> <none>
a2-sleep-task-5 0/1 Pending 0 3s <none> <none> <none> <none>
a2-sleep-task-3 0/1 Pending 0 3s <none> <none> <none> <none>
a2-sleep-task-7 0/1 Pending 0 3s <none> <none> <none> <none>
a2-sleep-task-4 0/1 Pending 0 3s <none> <none> <none> <none>
a2-sleep-task-6 0/1 Pending 0 3s <none> <none> <none> <none>
a1-sleep-task-3 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-2 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-6 0/1 Pending 0 2s <none> <none> <none> <none>
a1-sleep-task-1 0/1 Pending 0 0s <none> <none> <none> <none>
a1-sleep-task-7 0/1 Pending 0 2s <none> <none> <none> <none>
a1-sleep-task-0 0/1 Pending 0 0s <none> <none> <none> <none>
a2-sleep-task-1 1/1 Running 0 4s 192.168.221.160 las1 <none> <none>
a1-sleep-task-4 1/1 Running 0 3s 192.168.221.184 las1 <none> <none>
a2-sleep-task-0 1/1 Running 0 4s 192.168.221.146 las1 <none> <none>
a1-sleep-task-5 1/1 Running 0 3s 192.168.221.140 las1 <none> <none>
⋮
实验三:资源独占
同实验一,修改队列 a2 的 deserved 资源为 guarantee:
$ kubectl edit q a2
queue.scheduling.volcano.sh/a2 edited
修改内容如下:
kind: Queue
...
spec:
- deserved:
- cpu: 2
+ guarantee:
+ resource:
+ cpu: 2
parent: a
reclaimable: true
weight: 1
提交作业 a1, 观察其状态变化:
$ kubectl get vj -owide -w
NAME STATUS MINAVAILABLE RUNNINGS AGE QUEUE
a1 0s a1
a1 Pending 1 0s a1
a1 Pending 1 0s a1
a1 Running 1 1 2s a1
a1 Running 1 2 2s a1
a1 Running 1 1 64s a1
a1 Running 1 64s a1
a1 Running 1 1 65s a1
a1 Running 1 2 66s a1
a1 Running 1 1 2m7s a1
a1 Running 1 2m7s a1
a1 Running 1 1 2m9s a1
a1 Running 1 2 2m10s a1
a1 Running 1 1 3m10s a1
a1 Running 1 3m10s a1
a1 Running 1 1 3m13s a1
a1 Running 1 2 3m13s a1
a1 Running 1 1 4m13s a1
a1 Completing 1 4m14s a1
a1 Completed 1 4m14s a1
a1 Completed 1 4m14s a1
可以看出,作业 a1 不再能借用队列 a2 的资源了,相当于队列 a2 的资源被独占了。