4. 创建Pod及使用IPU

用户可以通过需要的资源类型使用IPU,如 Deployment  或 Pod

准备 test.yaml:

  • Kubernetes Pod示例

    apiVersion: v1
    kind: Pod
    metadata:
      name: ipu-test-1
    
    spec:
      containers:
      - name: demo-ipu-test
        image: graphcore/pytorch:latest 
        command: ["/bin/bash", "-c", "--"]
        args: ["sleep infinity & wait"]
        resources:
          limits:
            c600.graphcore.ai/ipu: "1" # Number of IPUs allocated to the Pod
    

    pod-test.yaml (rename to test.yaml)

  • Deployment示例

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ipu-test
      namespace: default
      labels:
        app: app-test
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: app-test
      template:
        metadata:
          labels:
            app: app-test
        spec:
          containers:
          - name: demo-ipu-test
            image: graphcore/pytorch:latest 
            command: ["/bin/bash", "-c", "--"]
            args: ["sleep infinity & wait"]
            resources:
              limits:
                c600.graphcore.ai/ipu: "1"
    

    deployment-test.yaml (rename to test.yaml)

Deployment的副本缩放,回滚等功能依旧支持。 运行以下命令以创建Pod/Deployment:

$ kubectl apply -f test.yaml

以下指令用于查看Pod是否运行成功,期望得到的输出为:

$ kubectl get pod

NAME READY STATUS RESTARTS AGE
ipu-test-1 1/1 Running 0 4d

再次运行以下指令

$ kubectl describe nodes

将会发现allocated IPU 已经从0变成了1。

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource Requests Limits
  -------- -------- ------
  cpu 1 (2%) 0 (0%)
  memory 140Mi (0%) 340Mi (0%)
  ephemeral-storage 0 (0%) 0 (0%)
  hugepages-1Gi 0 (0%) 0 (0%)
  hugepages-2Mi 0 (0%) 0 (0%)
  c600.graphcore.ai/ipu 1 1

 ...

此时说明K8s集群已经可以通过Kubernetes IPU device plugin 调度使用IPU设备。