5. PopRun changelog

5.1. v2.2 (Poplar SDK 2.2)

5.1.1. New features

  • Added checks for IPU/GW-link routing and sync type of existing partititons. The existing partition is checked against the values passed to --ipu-link-routing-type, --gw-link-routing-type and --sync-type. In case of a mismatch, the partition will be updated if --update-partition=yes is provided.

  • Improved error message when the application was terminated by SIGKILL.

5.2. v2.1 (Poplar SDK 2.1)

5.2.1. New features

  • Show full hostnames after the topology table if they cannot fit inside the table.

  • Added command-line arguments for additional V-IPU options: --ipu-link-routing-type, --gw-link-routing-type and --sync-type.

  • Improved error reporting when user program is missing from the command-line invocation.

  • Added support for passing an environment variable to a specific instance by using --instance-mpi-local-args=<instance-index>:-x VAR=VALUE.

  • Added initial support for the Slurm workload manager. All the resources allocated by Slurm are made available to PopRun.

  • Removed dependency on the user locale. Avoids crashing in the case of an incorrectly configured user locale.

  • Improved NUMA node binding when using cpusets. Only the NUMA nodes allowed by the current cpuset are used.

  • Forward V-IPU timeout argument --vipu-server-timeout to IPUoF by internally passing the environment variable IPUOF_VIPU_API_TIMEOUT.

  • Improved SSH error reporting. Instead of hanging on authentication issues, a clear error is reported.

  • Automatically enable the gateway mode target option when using V-IPU.

  • Added support for running programs in the current working directory without a ./ prefix for consistency with mpirun.

  • Automatically enable NUMA awareness when there is more than one instance per host.

  • Support passing --mpi-local-args and --mpi-global-args multiple times by merging the values.

  • Verify the final state of partition after creation/reset. An error is reported if the partition was not created/reset correctly.

  • Get V-IPU server address from local V-IPU configuration if not specified as command-line argument.

  • Set the target options based on values reported by the V-IPU server.

5.3. v2.0 (Poplar SDK 2.0)

5.3.1. New features

  • Added documentation

  • POD native synchronisation support

  • Improved input validation

  • Offline mode support (running application without requiring IPUs)

  • Support multi IPU-Link domain and multi-host in offline mode

  • Newly created V-IPU partitions are not reset

  • Ability to specify a timeout for V-IPU server requests

  • Partitions created by PopRun will be automatically evicted

  • PopRun will provide interactive progress status while running

  • All available NUMA nodes may be used and pinned consecutively

  • OpenMPI 4.0 is now bundled with the Poplar SDK, removing OpenMPI as an external dependency.

  • Temporary executable caching to avoid redundant compilations on the same host

  • Added verification of the number of replicas in existing partitions

6. PopDist changelog

6.1. v2.2 (Poplar SDK 2.2)

6.1.1. New features

  • Improved error reporting in case of a missing IPU device.

6.2. v2.1 (Poplar SDK 2.1)

6.2.1. New features

  • Support offline mode with PopTorch without attaching to device.

  • Prevent poptorch.Options.Distributed being changed when using PopDist.

  • Update to use new TensorFlow IPUConfig option configuration API.

6.3. v2.0 (Poplar SDK 2.0)

6.3.1. New features

  • Added documentation

  • PopTorch support

  • Improved all user error messages

  • ipus_per_replica is now optional when calling getDeviceId