3. 通过容器安装

3.1. 安装docker 镜像

根据您是将Docker镜像保存在本地(离线安装),还是从Docker Hub中拉取(在线安装)来安装Docker镜像:

  • 离线安装

    $ docker load -i <docker_image_save_path>
    
  • 在线安装

    $ docker pull <docker image link>
    

3.2. 准备 ds.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: ipu-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: ipu-dp-ds
  template:
    metadata:
      labels:
        name: ipu-dp-ds
    spec:
      hostNetwork: true
      containers:
      - image: graphcorecn/ipu-k8s-device-plugin:latest
        name: ipu-k8s-device-plugin
        securityContext:
          privileged: true
        volumeMounts:
        - name: dp
          mountPath: /var/lib/kubelet/device-plugins
        - name: sys
          mountPath: /sys
        - name: hostvolume
          mountPath: /etc/ipuof.conf.d
      volumes:
      - name: dp
        hostPath:
          path: /var/lib/kubelet/device-plugins
      - name: sys
        hostPath:
          path: /sys
      - name: hostvolume
        hostPath:
          path: /etc/ipuof.conf.d

ds.yaml

3.3. 部署Kubernetes IPU device plugin

使用如下方式部署Kubernetes IPU device plugin:

$ kubectl apply -f ds.yaml

可以使用以下命令检查是否部署成功, 输出结果将包含新的DaemonSet:

$ kubectl get ds -n kube-system

NAMESPACE     NAME                          DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE

kube-system   ipu-device-plugin-daemonset   1        1        1      1           1          <none>

通过以下命令查看log, 输出结果如下所示:

$ kubectl logs -f <ipu-device-plugin-daemonset-pod> -n kube-system

1I1013 05:02:08.925503       1 main.go:25] Plugin version: dev
2I1013 05:02:08.925598       1 main.go:30] Starting FS watcher.
3I1013 05:02:08.925685       1 main.go:37] Starting OS watcher.
4E1013 05:02:08.926158       1 utils.go:64] stat /etc/vipu/vipu-cli.hcl: no such file or        directory
5E1013 05:02:08.926237       1 vipuclient.go:33] error creating VIPU client: V-IPU      configuration file not found: /etc/vipu/vipu-cli.hcl
6W1013 05:02:08.926262       1 devicemanager.go:94] vipu client cannot be created: V-IPU        configuration file not found: /etc/vipu/vipu-cli.hcl
7W1013 05:02:08.956855       1 storage.go:45] unable to read existing storage file, a new       empty one will be created /etc/ipuof.conf.d/storage/storage.json: open /etc/ipuof.conf.d/       storage/storage.json: no such file or directory
8I1013 05:02:08.957688       1 server.go:94] Starting GRPC server for 'c600.graphcore.ai/       ipu'
9I1013 05:02:08.958722       1 server.go:68] Started to serve 'c600.graphcore.ai/ipu' on /      var/lib/kubelet/device-plugins/ipu.sock
10I1013 05:02:08.965747      1 server.go:75] Registered device plugin for       'c600.graphcore.ai/ipu' with Kubelet
11I1013 05:02:08.966352      1 server.go:171] Inside list and watch

运行以下命令:

$ kubectl describe nodes

将看到有一个新的可用设备类型 c600.graphcore.ai/ipu

Capacity:
   cpu: 48
   ephemeral-storage: 28703652Ki
   c600.graphcore.ai/ipu: 8
   hugepages-1Gi: 0
   hugepages-2Mi: 0
   memory: 758240352Ki
   pods: 110
Allocatable:
   cpu: 48
   ephemeral-storage: 26453285640
   c600.graphcore.ai/ipu: 8
   hugepages-1Gi: 0
   hugepages-2Mi: 0
   memory: 758137952Ki
   pods: 110

...

Allocated resources:
   (Total limits may be over 100 percent, i.e., overcommitted.)
   Resource Requests Limits
   -------- -------- ------
   cpu 1 (2%) 0 (0%)
   memory 140Mi (0%) 340Mi (0%)
   ephemeral-storage 0 (0%) 0 (0%)
   hugepages-1Gi 0 (0%) 0 (0%)
   hugepages-2Mi 0 (0%) 0 (0%)
   c600.graphcore.ai/ipu 0 0
   Events: <none>