# Dynamic Resource Allocation
相关知识请参阅 Kubernetes 官网文档 , 目前是 1.34 版本。
## Kubernetes 1.32
请参阅 Kubernetes 官网文档 .
`DynamicResourceAllocation` 在 Kubernetes 1.32 上为 beta 特性,需要额外参数启用。如果集群是用 `kubeadm` 安装的,控制平面运行在 Pod 里,可用以下命令检查:
```console
$ kubectl get po -n kube-system -l tier=control-plane
NAME READY STATUS RESTARTS AGE
etcd-las0 1/1 Running 3 (23d ago) 177d
kube-apiserver-las0 1/1 Running 0 23h
kube-controller-manager-las0 1/1 Running 1 (23h ago) 23h
kube-scheduler-las0 1/1 Running 0 23h
```
这种情况下,需要在所有控制平面节点上修改以下三个文件:
1. `/etc/kubernetes/manifests/kube-apiserver.yaml`
:::{literalinclude} /_files/ubuntu/etc/kubernetes/manifests/kube-apiserver.yaml
:diff: /_files/ubuntu/etc/kubernetes/manifests/kube-apiserver.yaml.orig
:::
2. `/etc/kubernetes/manifests/kube-controller-manager.yaml`
:::{literalinclude} /_files/ubuntu/etc/kubernetes/manifests/kube-controller-manager.yaml
:diff: /_files/ubuntu/etc/kubernetes/manifests/kube-controller-manager.yaml.orig
:::
3. `/etc/kubernetes/manifests/kube-scheduler.yaml`
:::{literalinclude} /_files/ubuntu/etc/kubernetes/manifests/kube-scheduler.yaml
:diff: /_files/ubuntu/etc/kubernetes/manifests/kube-scheduler.yaml.orig
:::
修改完成后相关的 Pod 会自动重启。
### 安装 dra-example-driver
`dra-example-driver` 是一个 DRA 设备驱动的 DEMO.
下载源码:
```console
$ git clone git@github.com:kubernetes-sigs/dra-example-driver.git
```
构建驱动(使用 docker 需要设置环境变量)
```console
$ cd dra-example-driver/
$ CONTAINER_TOOL=docker ./demo/build-driver.sh
```
使用 `helm` 部署到集群:
```console
$ helm upgrade -i --create-namespace --namespace dra-example-driver dra-example-driver deployments/helm/dra-example-driver
Release "dra-example-driver" does not exist. Installing it now.
NAME: dra-example-driver
LAST DEPLOYED: Tue Nov 4 17:45:05 2025
NAMESPACE: dra-example-driver
STATUS: deployed
REVISION: 1
TEST SUITE: None
```
查看其 Workloads:
```console
$ kubectl get all -n dra-example-driver
NAME READY STATUS RESTARTS AGE
pod/dra-example-driver-kubeletplugin-67x59 1/1 Running 0 3m53s
pod/dra-example-driver-kubeletplugin-j2bl2 1/1 Running 0 3m53s
pod/dra-example-driver-kubeletplugin-ndsw9 1/1 Running 0 3m53s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/dra-example-driver-kubeletplugin 3 3 3 3 3 3m53s
```
查看生成的 ResourcesSlices 和 DeviceClasses:
```console
$ kubectl get resourceslice
NAME NODE DRIVER POOL AGE
las1-gpu.example.com-n297b las1 gpu.example.com las1 4m8s
las2-gpu.example.com-tpwcb las2 gpu.example.com las2 4m18s
las3-gpu.example.com-m4zn4 las3 gpu.example.com las3 4m22s
$ kubectl get deviceclasses
NAME AGE
gpu.example.com 7m39s
```
进一步查看 ResourceSlice 的说明:
```console
$ kdesc resourceslice las1-gpu.example.com-n297b
Name: las1-gpu.example.com-n297b
Namespace:
Labels:
Annotations:
API Version: resource.k8s.io/v1beta1
Kind: ResourceSlice
...
Spec:
Devices:
Basic:
Attributes:
Driver Version:
Version: 1.0.0
Index:
Int: 0
Model:
String: LATEST-GPU-MODEL
Uuid:
String: gpu-94011f0b-8dcd-b4b0-cd99-40eab2e3c96a
Capacity:
Memory:
Value: 80Gi
Name: gpu-0
...
```
可以看到生成了名为 `gpu-*` 的设备(实际上每个节点上有 8 个)。
:::{note}
驱动卸载时没有删除 ResourceSlices. 用以下命令删除:
```console
$ kubectl delete resourceslice --field-selector spec.driver=gpu.example.com
resourceslice.resource.k8s.io "las1-gpu.example.com-n297b" deleted
resourceslice.resource.k8s.io "las2-gpu.example.com-tpwcb" deleted
resourceslice.resource.k8s.io "las3-gpu.example.com-m4zn4" deleted
```
:::
### 测试
在集群内创建一个 ResourceClaimTemplate:
```console
$ kubectl apply -f example_resourceclaimtemplate.yaml
resourceclaimtemplate.resource.k8s.io/example created
```
其定义如下:
:::{literalinclude} /_files/macos/workspace/k8s/dra/example_resourceclaimtemplate.yaml
:::
再创建一个 Pod 进行测试。Pod 的定义如下:
:::{literalinclude} /_files/macos/workspace/k8s/dra/example_claim_po.yaml
:::
创建 Pod 时可以监视 ResourceClaim 资源的变化:
```console
$ kubectl get resourceclaim -w
NAME STATE AGE
example-claim-example-4cb2t pending 0s
example-claim-example-4cb2t pending 0s
example-claim-example-4cb2t allocated,reserved 0s
example-claim-example-4cb2t pending 63s
example-claim-example-4cb2t pending 63s
example-claim-example-4cb2t pending 63s
```
这种自动生成的 ResourceClaim 的所有者是这个 Pod, 当 Pod 被删除时它也被删除。
## Kubernetes 1.34
把集群升级到 1.34:
```console
$ kubectl get no
NAME STATUS ROLES AGE VERSION
las0 Ready control-plane 189d v1.34.2
las1 Ready 189d v1.34.2
las2 Ready 189d v1.34.2
las3 Ready 185d v1.34.2
```
`DynamicResourceAllocation` 特性在 Kubernetes 1.34 上默认启用,所以之前的额外参数可以去掉,但别忘了升级服务映像的版本:
1. `/etc/kubernetes/manifests/kube-apiserver.yaml`
:::{literalinclude} /_files/ubuntu/etc/kubernetes/manifests/kube-apiserver_1.34.yaml
:diff: /_files/ubuntu/etc/kubernetes/manifests/kube-apiserver.yaml.orig
:::
2. `/etc/kubernetes/manifests/kube-controller-manager.yaml`
:::{literalinclude} /_files/ubuntu/etc/kubernetes/manifests/kube-controller-manager_1.34.yaml
:diff: /_files/ubuntu/etc/kubernetes/manifests/kube-controller-manager.yaml.orig
:::
3. `/etc/kubernetes/manifests/kube-scheduler.yaml`
:::{literalinclude} /_files/ubuntu/etc/kubernetes/manifests/kube-scheduler_1.34.yaml
:diff: /_files/ubuntu/etc/kubernetes/manifests/kube-scheduler.yaml.orig
:::
检查 API 版本以确认:
```console
$ kubectl api-versions | grep resource.k8s.io
resource.k8s.io/v1
```
重新安装 dra-example-driver.
ResourceClaimTemplate 需要修改:
:::{literalinclude} /_files/macos/workspace/k8s/dra/example_resourceclaimtemplate_1.34.yaml
:diff: /_files/macos/workspace/k8s/dra/example_resourceclaimtemplate.yaml
:::
原来的对应字段被挪到了 `exactly` 下面。`exactly` 可以变为 `firstAvailable`, 其下可以放置一个列表以提供备选。
Pod 的定义不需要任何修改。