Dynamic Resource Allocation
相关知识请参阅 Kubernetes 官网文档 https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/, 目前是 1.34 版本。
Kubernetes 1.32
请参阅 Kubernetes 官网文档 https://v1-32.docs.kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/.
DynamicResourceAllocation 在 Kubernetes 1.32 上为 beta 特性,需要额外参数启用。如果集群是用 kubeadm 安装的,控制平面运行在 Pod 里,可用以下命令检查:
$ kubectl get po -n kube-system -l tier=control-plane
NAME READY STATUS RESTARTS AGE
etcd-las0 1/1 Running 3 (23d ago) 177d
kube-apiserver-las0 1/1 Running 0 23h
kube-controller-manager-las0 1/1 Running 1 (23h ago) 23h
kube-scheduler-las0 1/1 Running 0 23h
这种情况下,需要在所有控制平面节点上修改以下三个文件:
/etc/kubernetes/manifests/kube-apiserver.yaml- --service-cluster-ip-range=10.96.0.0/12 - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key + - --feature-gates=DynamicResourceAllocation=true + - --runtime-config=resource.k8s.io/v1beta1=true image: registry.aliyuncs.com/google_containers/kube-apiserver:v1.32.0 imagePullPolicy: IfNotPresent livenessProbe:
/etc/kubernetes/manifests/kube-controller-manager.yaml- --service-account-private-key-file=/etc/kubernetes/pki/sa.key - --service-cluster-ip-range=10.96.0.0/12 - --use-service-account-credentials=true + - --feature-gates=DynamicResourceAllocation=true image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.32.0 imagePullPolicy: IfNotPresent livenessProbe:
/etc/kubernetes/manifests/kube-scheduler.yaml- --bind-address=127.0.0.1 - --kubeconfig=/etc/kubernetes/scheduler.conf - --leader-elect=true + - --feature-gates=DynamicResourceAllocation=true image: registry.aliyuncs.com/google_containers/kube-scheduler:v1.32.0 imagePullPolicy: IfNotPresent livenessProbe:
修改完成后相关的 Pod 会自动重启。
安装 dra-example-driver
dra-example-driver 是一个 DRA 设备驱动的 DEMO.
下载源码:
$ git clone git@github.com:kubernetes-sigs/dra-example-driver.git
构建驱动(使用 docker 需要设置环境变量)
$ cd dra-example-driver/
$ CONTAINER_TOOL=docker ./demo/build-driver.sh
使用 helm 部署到集群:
$ helm upgrade -i --create-namespace --namespace dra-example-driver dra-example-driver deployments/helm/dra-example-driver
Release "dra-example-driver" does not exist. Installing it now.
NAME: dra-example-driver
LAST DEPLOYED: Tue Nov 4 17:45:05 2025
NAMESPACE: dra-example-driver
STATUS: deployed
REVISION: 1
TEST SUITE: None
查看其 Workloads:
$ kubectl get all -n dra-example-driver
NAME READY STATUS RESTARTS AGE
pod/dra-example-driver-kubeletplugin-67x59 1/1 Running 0 3m53s
pod/dra-example-driver-kubeletplugin-j2bl2 1/1 Running 0 3m53s
pod/dra-example-driver-kubeletplugin-ndsw9 1/1 Running 0 3m53s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/dra-example-driver-kubeletplugin 3 3 3 3 3 <none> 3m53s
查看生成的 ResourcesSlices 和 DeviceClasses:
$ kubectl get resourceslice
NAME NODE DRIVER POOL AGE
las1-gpu.example.com-n297b las1 gpu.example.com las1 4m8s
las2-gpu.example.com-tpwcb las2 gpu.example.com las2 4m18s
las3-gpu.example.com-m4zn4 las3 gpu.example.com las3 4m22s
$ kubectl get deviceclasses
NAME AGE
gpu.example.com 7m39s
进一步查看 ResourceSlice 的说明:
$ kdesc resourceslice las1-gpu.example.com-n297b
Name: las1-gpu.example.com-n297b
Namespace:
Labels: <none>
Annotations: <none>
API Version: resource.k8s.io/v1beta1
Kind: ResourceSlice
⋮
Spec:
Devices:
Basic:
Attributes:
Driver Version:
Version: 1.0.0
Index:
Int: 0
Model:
String: LATEST-GPU-MODEL
Uuid:
String: gpu-94011f0b-8dcd-b4b0-cd99-40eab2e3c96a
Capacity:
Memory:
Value: 80Gi
Name: gpu-0
⋮
可以看到生成了名为 gpu-* 的设备(实际上每个节点上有 8 个)。
Note
驱动卸载时没有删除 ResourceSlices. 用以下命令删除:
$ kubectl delete resourceslice --field-selector spec.driver=gpu.example.com
resourceslice.resource.k8s.io "las1-gpu.example.com-n297b" deleted
resourceslice.resource.k8s.io "las2-gpu.example.com-tpwcb" deleted
resourceslice.resource.k8s.io "las3-gpu.example.com-m4zn4" deleted
测试
在集群内创建一个 ResourceClaimTemplate:
$ kubectl apply -f example_resourceclaimtemplate.yaml
resourceclaimtemplate.resource.k8s.io/example created
其定义如下:
apiVersion: resource.k8s.io/v1beta1
kind: ResourceClaimTemplate
metadata:
name: example
spec:
spec:
devices:
requests:
- name: example-claim
deviceClassName: gpu.example.com
selectors:
- cel:
expression: |-
device.attributes["gpu.example.com"].model == "LATEST-GPU-MODEL"
&& device.capacity["gpu.example.com"].memory == quantity("80Gi")
再创建一个 Pod 进行测试。Pod 的定义如下:
apiVersion: v1
kind: Pod
metadata:
name: example-claim
spec:
restartPolicy: Never
containers:
- image: busybox:1.37.0-glibc
imagePullPolicy: IfNotPresent
name: sleep-busybox
command: ["sh", "-c", "trap exit INT TERM; sleep 1m & wait"]
resources:
requests:
cpu: "1"
memory: 100Mi
limits:
cpu: "1"
memory: 100Mi
claims:
- name: example
resourceClaims:
- name: example
resourceClaimTemplateName: example
创建 Pod 时可以监视 ResourceClaim 资源的变化:
$ kubectl get resourceclaim -w
NAME STATE AGE
example-claim-example-4cb2t pending 0s
example-claim-example-4cb2t pending 0s
example-claim-example-4cb2t allocated,reserved 0s
example-claim-example-4cb2t pending 63s
example-claim-example-4cb2t pending 63s
example-claim-example-4cb2t pending 63s
这种自动生成的 ResourceClaim 的所有者是这个 Pod, 当 Pod 被删除时它也被删除。
Kubernetes 1.34
把集群升级到 1.34:
$ kubectl get no
NAME STATUS ROLES AGE VERSION
las0 Ready control-plane 189d v1.34.2
las1 Ready <none> 189d v1.34.2
las2 Ready <none> 189d v1.34.2
las3 Ready <none> 185d v1.34.2
DynamicResourceAllocation 特性在 Kubernetes 1.34 上默认启用,所以之前的额外参数可以去掉,但别忘了升级服务映像的版本:
/etc/kubernetes/manifests/kube-apiserver.yaml- --service-cluster-ip-range=10.96.0.0/12 - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key - image: registry.aliyuncs.com/google_containers/kube-apiserver:v1.32.0 + image: registry.aliyuncs.com/google_containers/kube-apiserver:v1.34.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 8
/etc/kubernetes/manifests/kube-controller-manager.yaml- --service-account-private-key-file=/etc/kubernetes/pki/sa.key - --service-cluster-ip-range=10.96.0.0/12 - --use-service-account-credentials=true - image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.32.0 + image: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.34.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 8
/etc/kubernetes/manifests/kube-scheduler.yaml- --bind-address=127.0.0.1 - --kubeconfig=/etc/kubernetes/scheduler.conf - --leader-elect=true - image: registry.aliyuncs.com/google_containers/kube-scheduler:v1.32.0 + image: registry.aliyuncs.com/google_containers/kube-scheduler:v1.34.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 8
检查 API 版本以确认:
$ kubectl api-versions | grep resource.k8s.io
resource.k8s.io/v1
重新安装 dra-example-driver.
ResourceClaimTemplate 需要修改:
-apiVersion: resource.k8s.io/v1beta1
+apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
name: example
devices:
requests:
- name: example-claim
- deviceClassName: gpu.example.com
- selectors:
- - cel:
- expression: |-
- device.attributes["gpu.example.com"].model == "LATEST-GPU-MODEL"
- && device.capacity["gpu.example.com"].memory == quantity("80Gi")
+ exactly:
+ deviceClassName: gpu.example.com
+ selectors:
+ - cel:
+ expression: |-
+ device.attributes["gpu.example.com"].model == "LATEST-GPU-MODEL"
+ && device.capacity["gpu.example.com"].memory == quantity("80Gi")
原来的对应字段被挪到了 exactly 下面。exactly 可以变为 firstAvailable, 其下可以放置一个列表以提供备选。
Pod 的定义不需要任何修改。