9. Known issues

9.1. Race condition with multiple users of the same partition

If there are multiple users of the same reconfigurable V-IPU partition, there is a potential race condition that can cause another user of the system to interfere with a PopRun launch, even if there are sufficient resources available for both users.

A PopRun launch on a reconfigurable partition consists of two phases:

  1. PopRun first acquires and configures a Poplar parent device with all the IPUs required.

  2. Next, each instance acquires its designated Poplar child device of the given parent device consisting of the IPUs allocated to that instance.

The race condition happens if there is another process that acquires any of the required IPUs between phase 1 and 2, and this will cause a failure when an instance tries to attach to it in phase 2 because it is already in use.