Volcano Binpack 调度实验
使用集群中的两个节点 las1, las2. 节点的资源是 16 cpu, 8 gpu.
实验设计如下:
在 las1 启动一个 Pod, 占用其一半 cpu. Pod 配置如下:
apiVersion: v1
kind: Pod
metadata:
name: cpu-dominant
spec:
nodeName: las1
schedulerName: volcano
restartPolicy: OnFailure
containers:
- image: busybox:1.37.0-glibc
imagePullPolicy: IfNotPresent
name: main
command: ["sh", "-c", "trap exit INT TERM; sleep infinity & wait"]
resources:
requests:
cpu: "8"
memory: 1Gi
nvidia.com/gpu: "0"
limits:
cpu: "8"
memory: 1Gi
nvidia.com/gpu: "0"
在 las2 启动一个 Pod, 占用其一半 gpu. Pod 配置修改如下:
apiVersion: v1
kind: Pod
metadata:
- name: cpu-dominant
+ name: gpu-dominant
spec:
- nodeName: las1
+ nodeName: las2
schedulerName: volcano
restartPolicy: OnFailure
containers:
command: ["sh", "-c", "trap exit INT TERM; sleep infinity & wait"]
resources:
requests:
- cpu: "8"
+ cpu: "1"
memory: 1Gi
- nvidia.com/gpu: "0"
+ nvidia.com/gpu: "4"
limits:
- cpu: "8"
+ cpu: "1"
memory: 1Gi
- nvidia.com/gpu: "0"
+ nvidia.com/gpu: "4"
配置 Binpack 以 cpu 为主
修改 Volcano 调度器配置:
$ kubectl edit cm -n volcano-system volcano-scheduler-configmap
修改内容如下:
- name: proportion
- name: nodeorder
- name: binpack
+ arguments:
+ binpack.weight: 100
+ binpack.cpu: 1
+ binpack.resources: nvidia.com/gpu
+ binpack.resources.nvidia.com/gpu: 0
kind: ConfigMap
...
这里将 binpack.weight 调到很大以减轻其他因素的影响。
启动两个 Pod. 第一个 Pod 使用 1 个 cpu, 配置如下:
apiVersion: v1
kind: Pod
metadata:
name: cpu-only
spec:
schedulerName: volcano
restartPolicy: OnFailure
containers:
- image: busybox:1.37.0-glibc
imagePullPolicy: IfNotPresent
name: gpu-ubuntu
command: ["sh", "-c", "trap exit INT TERM; sleep infinity & wait"]
resources:
requests:
cpu: "1"
memory: 1Gi
nvidia.com/gpu: "0"
limits:
cpu: "1"
memory: 1Gi
nvidia.com/gpu: "0"
第二个 Pod 除了使用 1 个 cpu, 还使用一个 gpu, 配置修改如下:
apiVersion: v1
kind: Pod
metadata:
- name: cpu-only
+ name: cpu-gpu
spec:
schedulerName: volcano
restartPolicy: OnFailure
requests:
cpu: "1"
memory: 1Gi
- nvidia.com/gpu: "0"
+ nvidia.com/gpu: "1"
limits:
cpu: "1"
memory: 1Gi
- nvidia.com/gpu: "0"
+ nvidia.com/gpu: "1"
两个 Pod 启动以后,观察调度结果如下:
$ kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cpu-dominant 1/1 Running 0 24s 192.168.221.164 las1 <none> <none>
cpu-gpu 1/1 Running 0 1s 192.168.221.165 las1 <none> <none>
cpu-only 1/1 Running 0 5s 192.168.221.151 las1 <none> <none>
gpu-dominant 1/1 Running 0 20s 192.168.67.165 las2 <none> <none>
可见新启动的 Pod 都被调度到了 cpu 使用率高的节点上。
配置 Binpack 以 gpu 为主
修改 Volcano 调度器配置:
$ kubectl edit cm -n volcano-system volcano-scheduler-configmap
修改内容如下:
apiVersion: v1
data:
volcano-scheduler.conf: |
- actions: "enqueue, allocate, backfill"
+ actions: "enqueue, allocate, backfill, preempt, reclaim"
tiers:
- plugins:
- name: priority
+ - name: overcommit
- name: gang
enablePreemptable: false
- name: conformance
- plugins:
- - name: overcommit
- name: drf
enablePreemptable: false
+ - name: deviceshare
+ arguments:
+ deviceshare.VGPUEnable: true
- name: predicates
- - name: proportion
+ - name: capacity
+ enableHierarchy: true
- name: nodeorder
- name: binpack
arguments:
binpack.weight: 100
- binpack.cpu: 1
+ binpack.cpu: 0
binpack.resources: nvidia.com/gpu
- binpack.resources.nvidia.com/gpu: 0
+ binpack.resources.nvidia.com/gpu: 1
kind: ConfigMap
...
删除以上两个 Pod:
$ kubectl delete po cpu-only cpu-gpu
pod "cpu-only" deleted
pod "cpu-gpu" deleted
重新启动这两个 Pod, 观察调度结果如下:
$ kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cpu-dominant 1/1 Running 0 75s 192.168.221.136 las1 <none> <none>
cpu-gpu 1/1 Running 0 50s 192.168.67.129 las2 <none> <none>
cpu-only 1/1 Running 0 58s 192.168.221.139 las1 <none> <none>
gpu-dominant 1/1 Running 0 70s 192.168.67.153 las2 <none> <none>
可见 cpu-only 这个 Pod 仍然被调度到了 cpu 使用率高的节点上,cpu-gpu 这个 Pod 被调度到了 gpu 使用率高的节点上。
总结
Volcano 调度器支持 Binpack, 但需要正确配置 binpack 插件参数。
binpack.weight为打分总权重binpack.cpu为 cpu 分数权重binpack.memory为 memory 分数权重binpack.resources定义了生效的其他/自定义资源,用逗号分隔binpack.resources.nvidia.com/gpu为自定义资源nvidia.com/gpu分数权重