3. 通过容器安装
3.1. 安装docker 镜像
根据您是将Docker镜像保存在本地(离线安装),还是从Docker Hub中拉取(在线安装)来安装Docker镜像:
离线安装
$ docker load -i <docker_image_save_path>
在线安装
$ docker pull <docker image link>
3.2. 准备 ds.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ipu-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: ipu-dp-ds
template:
metadata:
labels:
name: ipu-dp-ds
spec:
hostNetwork: true
containers:
- image: graphcorecn/ipu-k8s-device-plugin:latest
name: ipu-k8s-device-plugin
securityContext:
privileged: true
volumeMounts:
- name: dp
mountPath: /var/lib/kubelet/device-plugins
- name: sys
mountPath: /sys
- name: hostvolume
mountPath: /etc/ipuof.conf.d
volumes:
- name: dp
hostPath:
path: /var/lib/kubelet/device-plugins
- name: sys
hostPath:
path: /sys
- name: hostvolume
hostPath:
path: /etc/ipuof.conf.d
3.3. 部署Kubernetes IPU device plugin
使用如下方式部署Kubernetes IPU device plugin:
$ kubectl apply -f ds.yaml
可以使用以下命令检查是否部署成功, 输出结果将包含新的DaemonSet:
$ kubectl get ds -n kube-system
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system ipu-device-plugin-daemonset 1 1 1 1 1 <none>
通过以下命令查看log, 输出结果如下所示:
$ kubectl logs -f <ipu-device-plugin-daemonset-pod> -n kube-system
1I1013 05:02:08.925503 1 main.go:25] Plugin version: dev
2I1013 05:02:08.925598 1 main.go:30] Starting FS watcher.
3I1013 05:02:08.925685 1 main.go:37] Starting OS watcher.
4E1013 05:02:08.926158 1 utils.go:64] stat /etc/vipu/vipu-cli.hcl: no such file or directory
5E1013 05:02:08.926237 1 vipuclient.go:33] error creating VIPU client: V-IPU configuration file not found: /etc/vipu/vipu-cli.hcl
6W1013 05:02:08.926262 1 devicemanager.go:94] vipu client cannot be created: V-IPU configuration file not found: /etc/vipu/vipu-cli.hcl
7W1013 05:02:08.956855 1 storage.go:45] unable to read existing storage file, a new empty one will be created /etc/ipuof.conf.d/storage/storage.json: open /etc/ipuof.conf.d/ storage/storage.json: no such file or directory
8I1013 05:02:08.957688 1 server.go:94] Starting GRPC server for 'c600.graphcore.ai/ ipu'
9I1013 05:02:08.958722 1 server.go:68] Started to serve 'c600.graphcore.ai/ipu' on / var/lib/kubelet/device-plugins/ipu.sock
10I1013 05:02:08.965747 1 server.go:75] Registered device plugin for 'c600.graphcore.ai/ipu' with Kubelet
11I1013 05:02:08.966352 1 server.go:171] Inside list and watch
运行以下命令:
$ kubectl describe nodes
将看到有一个新的可用设备类型 c600.graphcore.ai/ipu
。
Capacity:
cpu: 48
ephemeral-storage: 28703652Ki
c600.graphcore.ai/ipu: 8
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 758240352Ki
pods: 110
Allocatable:
cpu: 48
ephemeral-storage: 26453285640
c600.graphcore.ai/ipu: 8
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 758137952Ki
pods: 110
...
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1 (2%) 0 (0%)
memory 140Mi (0%) 340Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
c600.graphcore.ai/ipu 0 0
Events: <none>