7. PopRun changelog
7.1. v2.3 (Poplar SDK 2.3)
7.1.1. New features
Export some commonly used environment variables by default. The environment variables
PATH
,LD_LIBRARY_PATH
andPYTHONPATH
are exported by default to all instances. Passing them to--mpi-local-args="-x ENV_VAR"
is no longer needed.Add support for Slurm hostlists. The
--host
argument now supports the Slurm hostlist syntax. For example,host[1-3,5]
will expand tohost1,host2,host3,host5
.Pick up configuration options from Slurm. The number of instances, replicas, IPUs per replica and the available hosts are picked up from Slurm environment variables if they exist. If an option is provided both by a command-line argument and by Slurm, the command-line argument take precedence.
Allow disabling executable caching. The executable cache can be disabled by passing an empty string using
--executable-cache-path "".
If there is only a single V-IPU partition available, it will now be used automatically without the need for specifying its name using
--vipu-partition
.Increase default V-IPU server timeout. The default value of
--vipu-server-timeout
is now 120 seconds.The new argument
--only-stdout-from-instance
allows suppressing the standard output from all instances except the given one. This is different from the existing--only-output-from-instance
in that it allows standard error from all instances.
7.2. v2.2 (Poplar SDK 2.2)
7.2.1. New features
Added checks for IPU/GW-link routing and sync type of existing partitions. The existing partition is checked against the values passed to
--ipu-link-routing-type
,--gw-link-routing-type
and--sync-type
. In case of a mismatch, the partition will be updated if--update-partition=yes
is provided.Improved error message when the application was terminated by SIGKILL.
7.3. v2.1 (Poplar SDK 2.1)
7.3.1. New features
Show full hostnames after the topology table if they cannot fit inside the table.
Added command-line arguments for additional V-IPU options:
--ipu-link-routing-type
,--gw-link-routing-type
and--sync-type
.Improved error reporting when user program is missing from the command-line invocation.
Added support for passing an environment variable to a specific instance by using
--instance-mpi-local-args=<instance-index>:-x VAR=VALUE
.Added initial support for the Slurm workload manager. All the resources allocated by Slurm are made available to PopRun.
Removed dependency on the user locale. Avoids crashing in the case of an incorrectly configured user locale.
Improved NUMA node binding when using cpusets. Only the NUMA nodes allowed by the current cpuset are used.
Forward V-IPU timeout argument
--vipu-server-timeout
to IPUoF by internally passing the environment variableIPUOF_VIPU_API_TIMEOUT
.Improved SSH error reporting. Instead of hanging on authentication issues, a clear error is reported.
Automatically enable the gateway mode target option when using V-IPU.
Added support for running programs in the current working directory without a
./
prefix for consistency withmpirun
.Automatically enable NUMA awareness when there is more than one instance per host.
Support passing
--mpi-local-args
and--mpi-global-args
multiple times by merging the values.Verify the final state of partition after creation/reset. An error is reported if the partition was not created/reset correctly.
Get V-IPU server address from local V-IPU configuration if not specified as command-line argument.
Set the target options based on values reported by the V-IPU server.
7.4. v2.0 (Poplar SDK 2.0)
7.4.1. New features
Added documentation
POD native synchronisation support
Improved input validation
Offline mode support (running application without requiring IPUs)
Support multi IPU-Link domain and multi-host in offline mode
Newly created V-IPU partitions are not reset
Ability to specify a timeout for V-IPU server requests
Partitions created by PopRun will be automatically evicted
PopRun will provide interactive progress status while running
All available NUMA nodes may be used and pinned consecutively
OpenMPI 4.0 is now bundled with the Poplar SDK, removing OpenMPI as an external dependency.
Temporary executable caching to avoid redundant compilations on the same host
Added verification of the number of replicas in existing partitions
7.5. v1.0 (Poplar SDK 1.4)
7.5.1. New features
First release
8. PopDist changelog
8.1. v2.6 (Poplar SDK 2.6)
Parameter
ipus_per_replica
inpopdist.tensorflow.set_ipu_config
is deprecated and is now deduced from the--ipus-per-replica
parameter provided by PopRun. This parameter will be removed in the next release.
8.2. v2.3 (Poplar SDK 2.3)
No changes.
8.3. v2.2 (Poplar SDK 2.2)
8.3.1. New features
Improved error reporting in case of a missing IPU device.
8.4. v2.1 (Poplar SDK 2.1)
8.4.1. New features
Support offline mode with PopTorch without attaching to device.
Prevent poptorch.Options.Distributed being changed when using PopDist.
Update to use new TensorFlow IPUConfig option configuration API.
8.5. v2.0 (Poplar SDK 2.0)
8.5.1. New features
Added documentation
PopTorch support
Improved all user error messages
ipus_per_replica is now optional when calling getDeviceId
8.6. v1.0 (Poplar SDK 1.4)
8.6.1. New features
First release