7. PopRun changelog
Export some commonly used environment variables by default. The environment variables
PYTHONPATHare exported by default to all instances. Passing them to
--mpi-local-args="-x ENV_VAR"is no longer needed.
Add support for Slurm hostlists. The
--hostargument now supports the Slurm hostlist syntax. For example,
host[1-3,5]will expand to
Pick up configuration options from Slurm. The number of instances, replicas, IPUs per replica and the available hosts are picked up from Slurm environment variables if they exist. If an option is provided both by a command-line argument and by Slurm, the command-line argument take precedence.
Allow disabling executable caching. The executable cache can be disabled by passing an empty string using
If there is only a single V-IPU partition available, it will now be used automatically without the need for specifying its name using
Increase default V-IPU server timeout. The default value of
--vipu-server-timeoutis now 120 seconds.
The new argument
--only-stdout-from-instanceallows suppressing the standard output from all instances except the given one. This is different from the existing
--only-output-from-instancein that it allows standard error from all instances.
Added checks for IPU/GW-link routing and sync type of existing partititons. The existing partition is checked against the values passed to
--sync-type. In case of a mismatch, the partition will be updated if
Improved error message when the application was terminated by SIGKILL.
Show full hostnames after the topology table if they cannot fit inside the table.
Added command-line arguments for additional V-IPU options:
Improved error reporting when user program is missing from the command-line invocation.
Added support for passing an environment variable to a specific instance by using
Added initial support for the Slurm workload manager. All the resources allocated by Slurm are made available to PopRun.
Removed dependency on the user locale. Avoids crashing in the case of an incorrectly configured user locale.
Improved NUMA node binding when using cpusets. Only the NUMA nodes allowed by the current cpuset are used.
Forward V-IPU timeout argument
--vipu-server-timeoutto IPUoF by internally passing the environment variable
Improved SSH error reporting. Instead of hanging on authentication issues, a clear error is reported.
Automatically enable the gateway mode target option when using V-IPU.
Added support for running programs in the current working directory without a
./prefix for consistency with
Automatically enable NUMA awareness when there is more than one instance per host.
--mpi-global-argsmultiple times by merging the values.
Verify the final state of partition after creation/reset. An error is reported if the partition was not created/reset correctly.
Get V-IPU server address from local V-IPU configuration if not specified as command-line argument.
Set the target options based on values reported by the V-IPU server.
POD native synchronisation support
Improved input validation
Offline mode support (running application without requiring IPUs)
Support multi IPU-Link domain and multi-host in offline mode
Newly created V-IPU partitions are not reset
Ability to specify a timeout for V-IPU server requests
Partitions created by PopRun will be automatically evicted
PopRun will provide interactive progress status while running
All available NUMA nodes may be used and pinned consecutively
OpenMPI 4.0 is now bundled with the Poplar SDK, removing OpenMPI as an external dependency.
Temporary executable caching to avoid redundant compilations on the same host
Added verification of the number of replicas in existing partitions
8. PopDist changelog
popdist.tensorflow.set_ipu_configis deprecated and is now deduced from the
--ipus-per-replicaparameter provided by PopRun. This parameter will be removed in the next release.
Improved error reporting in case of a missing IPU device.
Support offline mode with PopTorch without attaching to device.
Prevent poptorch.Options.Distributed being changed when using PopDist.
Update to use new TensorFlow IPUConfig option configuration API.
Improved all user error messages
ipus_per_replica is now optional when calling getDeviceId