3. High-level architecture

The integrated system consists of Graphcore’s IPU-POD hardware/software and OpenStack software.

The IPU-POD components provide IPU compute resources and management for allocating user resources and keeping user sessions separate from other users.

OpenStack components allow provisioning for network devices (switches) and servers. This enables users to be able to request Poplar VMs that have access to IPU resources so their training or inference workloads can run.

_images/high-level-view.png

Fig. 3.1 IPU and OpenStack architecture overview

3.1. Functional components

3.1.1. IPU-POD hardware/software stack

  • IPU-Machine (IPU-M2000): blade that contains 4 IPUs and an IPU-Gateway chip which runs VIRM (V-IPU Resource Manager)

  • Host (Poplar) server: head node, a server machine that is used for Poplar VMs

  • Poplar VM: virtual machine spawned by OpenStack Nova and used by end user to start and run workloads

  • Compute node: generic server hosting a vPOD Controller VM

  • vPOD Controller VM: system running several services managing IPU-Machines using V-IPU Controller via VIRMs; monitoring/collecting logs for IPU-Machines

3.1.2. OpenStack

  • Horizon: web interface that allows for managing whole infrastructure

  • Keystone: for authentication and authorization

  • Nova: for compute, manages VMs

  • Neutron: for networking (using ML2/OVS), manages subnets, VLANs, and so on

  • Glance: stores and server VM images

  • Cinder: block storage, stores and serves volumes that can be mounted to a VM

3.2. Use cases and component interactions

A common use case is: requesting IPU resources; running a workload; then releasing the IPU resources.

_images/use-case.png

Fig. 3.2 IPU and OpenStack architecture overview

Component interactions for start VM and run workload use case

  1. End-user orders CSP’s IPU Orchestration to allocate IPU resources

  2. IPU Orchestration spawns via OpenStack a new VM with V-IPU Controller to manage vPOD

  3. IPU Orchestration allocates IPU-Machines and configures them in V-IPU Controller

  4. IPU Orchestration via OpenStack starts a user Poplar VM on selected Poplar server

  5. OpenStack Nova provisions a VM on indicated Poplar server using indicated VM image from Glance

  6. End-user opens SSH session to the Poplar VM

  7. User orders creation of a partition with selected IPU-Machines and their IPUs

  8. V-IPU Controller configures hardware according to specified partition information

  9. User starts a workload on a Poplar server VM

  10. When workload completes user destroys the partition and Poplar VM