3. High-level architecture
The integrated system consists of Graphcore’s IPU-POD hardware/software and OpenStack software.
The IPU-POD components provide IPU compute resources and management for allocating user resources and keeping user sessions separate from other users.
OpenStack components allow provisioning for network devices (switches) and servers. This enables users to be able to request Poplar VMs that have access to IPU resources so their training or inference workloads can run.
3.1. Functional components
3.1.1. IPU-POD hardware/software stack
IPU-Machine (IPU-M2000): blade that contains 4 IPUs and an IPU-Gateway chip which runs VIRM (V-IPU Resource Manager)
Host (Poplar) server: head node, a server machine that is used for Poplar VMs
Poplar VM: virtual machine spawned by OpenStack Nova and used by end user to start and run workloads
Compute node: generic server hosting a vPOD Controller VM
vPOD Controller VM: system running several services managing IPU-Machines using V-IPU Controller via VIRMs; monitoring/collecting logs for IPU-Machines
Horizon: web interface that allows for managing whole infrastructure
Keystone: for authentication and authorization
Nova: for compute, manages VMs
Neutron: for networking (using ML2/OVS), manages subnets, VLANs, and so on
Glance: stores and server VM images
Cinder: block storage, stores and serves volumes that can be mounted to a VM
3.2. Use cases and component interactions
A common use case is: requesting IPU resources; running a workload; then releasing the IPU resources.
Component interactions for start VM and run workload use case
End-user orders CSP’s IPU Orchestration to allocate IPU resources
IPU Orchestration spawns via OpenStack a new VM with V-IPU Controller to manage vPOD
IPU Orchestration allocates IPU-Machines and configures them in V-IPU Controller
IPU Orchestration via OpenStack starts a user Poplar VM on selected Poplar server
OpenStack Nova provisions a VM on indicated Poplar server using indicated VM image from Glance
End-user opens SSH session to the Poplar VM
User orders creation of a partition with selected IPU-Machines and their IPUs
V-IPU Controller configures hardware according to specified partition information
User starts a workload on a Poplar server VM
When workload completes user destroys the partition and Poplar VM