6. Storage

6.1. High performance storage appliance

To provide shared storage with sufficient performance to allow for IPU Pod processing, a storage appliance with the following features is used in this reference design:

  • NFS performance to serve a single client at up to 10 GB/s (using nconnect=16).

  • Support 8-way active LAG connectivity (see Section 8.7, Link aggregation) to provide up to 800 Gb/s connection bandwidth.

  • S3-compatible object storage performance to serve a single client (with multiple threads + connections) at up to 6 GB/s.

  • Support for multiple VLANs, subnets and access interfaces which can be configured dynamically via an API. One of each per vPOD is required.

  • Support for multiple NFS filesystems which can be configured dynamically via an API and which can have access restricted to specified subnets.

  • Support for dynamic creation of object storage access accounts via an API.

  • Automated snapshotting of NFS filesystems without performance impact.

6.2. Ceph clusters

A Ceph cluster built on a number of COTS servers with local NVME storage is provided. This provides all the block-storage services needed by OpenStack Cinder, and can provide specific virtual disk volumes to virtual machines as required.

Ceph is deployed using an open-source Ansible collection as containerised services. In this reference design v1.10.0 of the collection is used: https://github.com/stackhpc/ansible-collection-cephadm.

Access to the Ceph cluster is provided over the 100 GbE Ethernet physical network to ensure high performance access (up to 4 GB/s for a virtual volume mounted on a VM).

If required, this cluster can also be used to provide CephFS file storage which can be managed with OpenStack Manila.

Note

Since the cluster runs in the OpenStack infrastructure networks it cannot be directly exposed to end-user tenancies, so a router must be deployed to route client traffic to the Ceph subnet.

6.3. NVME drives

Each physical Poplar server has 7x NVME drives installed (minimum 1 TB each).

  • Poplar Hypervisor: NVME devices are configured as RAID0 or RAID6 volumes and used to provide ephemeral disks to each Poplar VM (via OpenStack Nova).

  • Ceph Hypervisor: most of the NVME capacity is passed to the Ceph cluster and the remaining portion to Nova.

  • Bare-metal Poplar: NVME devices are provided to the operating system and configured as RAID0 (or RAID6) and mounted as /localdata (a Graphcore convention).