5. PopRun changelog

5.1. v2.3 (Poplar SDK 2.3)

5.1.1. New features

  • Export some commonly used environment variables by default. The environment variables PATH, LD_LIBRARY_PATH and PYTHONPATH are exported by default to all instances. Passing them to --mpi-local-args="-x ENV_VAR" is no longer needed.

  • Add support for Slurm hostlists. The --host argument now supports the Slurm hostlist syntax. For example, host[1-3,5] will expand to host1,host2,host3,host5.

  • Pick up configuration options from Slurm. The number of instances, replicas, IPUs per replica and the available hosts are picked up from Slurm environment variables if they exist. If an option is provided both by a command-line argument and by Slurm, the command-line argument take precedence.

  • Allow disabling executable caching. The executable cache can be disabled by passing an empty string using --executable-cache-path "".

  • If there is only a single V-IPU partition available, it will now be used automatically without the need for specifying its name using --vipu-partition.

  • Increase default V-IPU server timeout. The default value of --vipu-server-timeout is now 120 seconds.

  • The new argument --only-stdout-from-instance allows suppressing the standard output from all instances except the given one. This is different from the existing --only-output-from-instance in that it allows standard error from all instances.

5.2. v2.2 (Poplar SDK 2.2)

5.2.1. New features

  • Added checks for IPU/GW-link routing and sync type of existing partititons. The existing partition is checked against the values passed to --ipu-link-routing-type, --gw-link-routing-type and --sync-type. In case of a mismatch, the partition will be updated if --update-partition=yes is provided.

  • Improved error message when the application was terminated by SIGKILL.

5.3. v2.1 (Poplar SDK 2.1)

5.3.1. New features

  • Show full hostnames after the topology table if they cannot fit inside the table.

  • Added command-line arguments for additional V-IPU options: --ipu-link-routing-type, --gw-link-routing-type and --sync-type.

  • Improved error reporting when user program is missing from the command-line invocation.

  • Added support for passing an environment variable to a specific instance by using --instance-mpi-local-args=<instance-index>:-x VAR=VALUE.

  • Added initial support for the Slurm workload manager. All the resources allocated by Slurm are made available to PopRun.

  • Removed dependency on the user locale. Avoids crashing in the case of an incorrectly configured user locale.

  • Improved NUMA node binding when using cpusets. Only the NUMA nodes allowed by the current cpuset are used.

  • Forward V-IPU timeout argument --vipu-server-timeout to IPUoF by internally passing the environment variable IPUOF_VIPU_API_TIMEOUT.

  • Improved SSH error reporting. Instead of hanging on authentication issues, a clear error is reported.

  • Automatically enable the gateway mode target option when using V-IPU.

  • Added support for running programs in the current working directory without a ./ prefix for consistency with mpirun.

  • Automatically enable NUMA awareness when there is more than one instance per host.

  • Support passing --mpi-local-args and --mpi-global-args multiple times by merging the values.

  • Verify the final state of partition after creation/reset. An error is reported if the partition was not created/reset correctly.

  • Get V-IPU server address from local V-IPU configuration if not specified as command-line argument.

  • Set the target options based on values reported by the V-IPU server.

5.4. v2.0 (Poplar SDK 2.0)

5.4.1. New features

  • Added documentation

  • POD native synchronisation support

  • Improved input validation

  • Offline mode support (running application without requiring IPUs)

  • Support multi IPU-Link domain and multi-host in offline mode

  • Newly created V-IPU partitions are not reset

  • Ability to specify a timeout for V-IPU server requests

  • Partitions created by PopRun will be automatically evicted

  • PopRun will provide interactive progress status while running

  • All available NUMA nodes may be used and pinned consecutively

  • OpenMPI 4.0 is now bundled with the Poplar SDK, removing OpenMPI as an external dependency.

  • Temporary executable caching to avoid redundant compilations on the same host

  • Added verification of the number of replicas in existing partitions

6. PopDist changelog

6.1. v2.6 (Poplar SDK 2.6)

  • Parameter ipus_per_replica in popdist.tensorflow.set_ipu_config is deprecated and is now deduced from the --ipus-per-replica parameter provided by PopRun. This parameter will be removed in the next release.

6.3. v2.2 (Poplar SDK 2.2)

6.3.1. New features

  • Improved error reporting in case of a missing IPU device.

6.4. v2.1 (Poplar SDK 2.1)

6.4.1. New features

  • Support offline mode with PopTorch without attaching to device.

  • Prevent poptorch.Options.Distributed being changed when using PopDist.

  • Update to use new TensorFlow IPUConfig option configuration API.

6.5. v2.0 (Poplar SDK 2.0)

6.5.1. New features

  • Added documentation

  • PopTorch support

  • Improved all user error messages

  • ipus_per_replica is now optional when calling getDeviceId