2. PopEF file format

Serialized AI models that can be run on an IPU can be broken down into several components. Each of these elements is called a blob and is readable and writeable. Below is a list of them:

  1. PoplarExecutable

  2. Metadata

  3. TensorData

  4. FeedData

  5. OpaqueBlob

Each of them forms a separate and independent part but together formulate a complete model. Therefore, it is possible to bind independent blobs into the model. Furthermore, the structure of blobs allows several files to be used to create a single model, or many models to be stored in a single file. Later in the chapter is more detailed information on each blob.

_images/popef_format.png

2.1. Blob header

From a high level point of view a PopEF file is a list of blobs, each of which starts with a BlobHeader.

Each header contains:

  • A version string: used by the consumer to assess how to handle a given blob.

  • An enum type:

    • PoplarExecutable

    • Metadata

    • TensorData

    • FeedData

    • OpaqueBlob

  • The total blob size as an integer value: the size of the header with the size of blob content.

A very basic PopEF consumer can list the contents of a file by parsing the first BlobHeader, then jumping to the next header using the size field from the first header and repeating this until it reaches the end of the file.

_images/file_structure.png

2.2. Poplar executable

This blob contains a serialised poplar::Executable.

These blobs are identified by a name which is used to link them to other blobs (Metadata, Opaque Blob), and therefore this field cannot be empty but must be unique among all read blobs of PoplarExecutable type.

Note

The Poplar library does not guarantee compatibility in a situation where the executable has been exported in a version other than the target runtime (there is no backward compatibility). In other words, you must use the same version of the Poplar library when exporting, importing and running the model to make sure your program works correctly.

Note

PoplarExecutable can take up a relatively large amount of disk space. Therefore, it is possible to compress this blob. This is possible when creating an executable blob. Please refer to the popef::Writer::createExecutable().

2.3. Opaque

Opaque blobs can be described as a black box. They can contain any information. Usually, they are used to store binary framework-specific information. They can usually only be interpreted by the framework that created them. An exception to that rule is the inference use case. Triton Server supported by Poplar Triton Backend is framework agnostic, which means that it doesn’t matter which framework exported the model, it can be used anyway. Assuming the whole functionality of the model was enclosed inside Poplar Executable and there are no pre- or post- Poplar executable execution steps defined in for example TensorFlow model.

These blobs are used to help the frameworks transition from fully opaque formats to a shared framework-agnostic format. PopEF frameworks can use opaque blobs to store data which is not supported by PopEF.

Opaque blobs are identified by the name (usually the name of the framework which generated them) and are linked to the given executable.

2.4. Metadata

Metadata blobs contain the information necessary to load and run a Poplar executable.

The executable is identified by the executable field.

2.4.1. Target runtime

The metadata contains information about model’s execution target. This includes the number and version of IPUs used by the model. In addition, the data includes options for the Poplar library and devices.

2.4.2. Program flow

Poplar executables are organised in the form of vector of runnable programs that can be grouped in so called program flows.

The number of programs present in an executable and what they do depend on the framework which created them (or the user themselves, assuming they created the model by hand using for example Poplar library API). Typically, for an inference model there will be at least one program to upload the weights from the host to the target (which should be called once at the beginning), and the main program that uploads some inputs to the target, runs the model and downloads the outputs back to the host.

The following pseudocode illustrates how the data usage contained in the program flows would look.

model = loadFromFile(filename)
# Transfer the weights to the IPU
model.runLoad()
for input in inputs:
  # Run the model and transfer the output back to the host.
  output = model.runMain(input)

Which programs need to be called as part of load and main is captured in the ProgramFlow

2.4.3. Anchors

A Poplar program doesn’t take input arguments and doesn’t return output values, instead it exposes a list of string handles the runtime needs to connect callbacks to. Therefore, Anchors allow you to manage model inputs and outputs as they contain any necessary information:

  1. In PopEF tensors are identified by the name. This is a human-readable string, and typically this name comes from the original high level model (for example, model.layer.weights or model.layer.bias).

  2. The handle name. It refers to the callback that the user must define to send and receive data to the program. At runtime, when Poplar encounters one of these handles as part of a program it will call the callback function with a pointer.

  3. The set of programs in which tensor is used.

  4. The tensor shape and its data type.

  5. The information on whether this is input or output tensor. In other words, should we copy from or to the pointer we receive in the callback.

  6. The information on whether the tensor contains different data in different replicas.

  7. The information on whether the tensor should be copied to the remote buffer (memory outside IPU that can be read and written by the IPU) as well as how many repeat blocks should be transferred (remote buffer can store multiple blocks of the same size).

Anchors can be associated with TensorData or FeedData blobs by name. In this case, the data can be delivered to the model based on the data contained in the blob. Otherwise, you must remember to provide your data for the program when executing the model, because the Anchor only contains information about the tensor and does not contain any of its data.

Note

Using replication and remote buffers features can affect the shape inside TensorInfo of given Anchor tensor. All unique replicas and repeat blocks must be placed in one memory block. Single tensor shape may be implicitly extended by two outer dimensions replication_factor and repeats. If use_remote_buffers is set to True and repeats is greater than one repeats dimension is added. If is_per_replica is set to True and replication_factor is greater than one, replication_factor dimension is added. The replication_factor dimension always precedes the repeats dimension.

Examples: 1) Configuration: tensor_shape={8, 3, 1}, use_remote_buffers``= True, ``repeats = 3, replication_factor = 2, is_per_replica = True gives the result TensorInfo{shape={2, 3, 8, 3, 1}} 2) Configuration: tensor_shape={8, 3, 1}, use_remote_buffers = True, repeats = 3, replication_factor = 1, is_per_replica = True gives the result TensorInfo{shape={3, 8, 3, 1}} 3) Configuration: tensor_shape={8, 3, 1}, use_remote_buffers = True, repeats = 3, replication_factor = 2, is_per_replica = False gives the result TensorInfo{shape={3, 8, 3, 1}} 4) Configuration: tensor_shape={8, 3, 1}, use_remote_buffers = False, repeats = 1, replication_factor = 1, is_per_replica = False gives the result TensorInfo{shape={8, 3, 1}} 5) Configuration: tensor_shape={8, 3, 1}, use_remote_buffers = False, repeats = 3, replication_factor = 2, is_per_replica = True gives the result TensorInfo{shape={2, 8, 3, 1}} 6) Configuration: tensor_shape={8, 3, 1}, use_remote_buffers``= True, ``repeats = 1, replication_factor = 2, is_per_replica = True gives the result TensorInfo{shape={2, 8, 3, 1}}

2.5. Tensor and Feed Data

The data needed to run a model can be stored in two different structures:

  • TensorDataInfo - single data for a single tensor, usually used for model parameters (for example model weights).

  • FeedDataInfo - multiple data for a single tensor, usually used for model inputs (for example an image set).

FeedDataInfo contains information on how many tensors are in the block. In other words, how many times the input for a model can be provided. Each of them contains a unique name, so that the data can be associated with the anchor in the metadata. Besides, they contian tensor shape, and data type. Available data types:

BOOL

boolean

F16

16-bit floating point

F32

32-bit floating point

F64

64-bit floating point

S8

8-bit signed integer

U8

8-bit unsigned integer

S16

16-bit signed integer

U16

16-bit unsigned integer

S32

32-bit signed integer

U32

32-bit unsigned integer

S64

64-bit signed integer

U64

64-bit unsigned integer

Both these structures are immediately followed by the actual data.

The size of this data must match exactly the size of the shape described by TensorInfo (no padding) multiplied by the number of tensors for the feeds (no gap between tensors).

Please see the figure below, which shows the arrangement of the data described above in the blob:

_images/tensor_data_structure.png

2.6. PopEF file analysis

The Poplar SDK provides the popef_dump tool. It allows you to analyze a PopEF file without using the C++ or Python API. It shows the file structure and indicates which blobs are included, with the blob’s basic information (it displays all the information in the case of a Metadata blob). However, it does not allow you to view the binary content of the blob. The tool allows you to read several files at the same time.

popef_dump has the following syntax:

popef_dump [options] <popef file path> [<popef file path>]...

Available popef_dump options:

--all

Display all the PopEF information listed below.

-m [ --metadata ]

Display the content of Metadata blobs. Metadata blobs contain the information necessary to load and run a Poplar executable.

-a [ --anchors ]

Display information for Anchors. This includes string handles that the runtime needs to connect callbacks to. Anchors contain all the necessary information to allow you to manage model inputs and outputs.

-u [ --user-anchors ]

Display user-provided inputs and outputs. That is, a list of the anchors that are not associated with either TensorData or FeedData blobs. In these cases, you must provide that data for the program when executing the model, because the anchor only contains information about the tensor and not any of its data.

-e [ --execs ]

Display information for PoplarExecutable blobs. These blobs contain a serialised Poplar executable.

-t [ --tensors ]

Display information for TensorData blobs. TensorData blob contains data for one tensor, usually used for model parameters (for example model weights). TensorData blob size in bytes is equal to the number of tensor’s elements multiplied by the data type size.

-f [ --feeds ]

Display information for FeedData blobs. FeedData blob contains multiple tensors for one input anchor, usually used for model inputs (for example an image set). FeedData blob size in bytes is equal to the number of tensors in the blob multiplied by the number of tensor’s elements and the data type size. If number of tensors in FeedData blob is larger than the corresponding anchors’s batch_size, the executable will have to be run multiple times to consume all the data from the FeedData blob.

-o [ --opaques ]

Display OpaqueBlob information. An OpaqueBlob can contain any data. Usually, they are used to store binary framework-specific information. They can usually only be interpreted by the framework that created them.

-c [ --color ]

Use colour formatting.

-v [ --verbose ]

Set the DEBUG loglevel for PopEF library.

-h [ --help ]

Print help message.

Below is an example of the output generated by the tool. The PopART framework was used to generate the PopEF file. The model contained in the PopEF file implements the following computation:

Add:0 = user_input + input_parameter

Each of the above tensors is a float32 two-element vector. input_parameter is a model’s parameter supplied by the TensorData blob. user_input is the input provided by the user and tensor Add:0 is the model output.

Listing 2.1 Example of Anchors, TensorData, PoplarExecutable and OpaqueBlob information displayed by popef_dump.
$ popef_dump -a -t -e -o executable.popef

PopEF file: executable.popef
Anchors:
  Inputs (User provided):
    Name: "user_input":
      TensorInfo: { dtype: F32, sizeInBytes: 8, shape [2] }
      Programs: [5]
      Handle: h2d_user_input
      IsPerReplica: True
  Inputs (Popef provided):
    Name: "input_parameter":
      TensorInfo: { dtype: F32, sizeInBytes: 8, shape [2] }
      Programs: [0]
      Handle: h2d_input_parameter
      IsPerReplica: False
  Outputs (User provided):
    Name: "Add:0":
      TensorInfo: { dtype: F32, sizeInBytes: 8, shape [2] }
      Programs: [5]
      Handle: anchor_d2h_Add:0
      IsPerReplica: True
  Outputs (Popef provided):
    Name: "input_parameter":
      TensorInfo: { dtype: F32, sizeInBytes: 8, shape [2] }
      Programs: [7]
      Handle: weight_d2h_input_parameter
      IsPerReplica: False
Executables:
  Name: "3284579348837926701":
    Is compressed: False
    Version:
    Available read size: 63682
Tensors:
  Name: "input_parameter":
    Version:
    Available read size: 16
    TensorInfo: { dtype: F32, sizeInBytes: 8, shape [2] }
Opaques:
  Name: "popart":
    Executable: 3284579348837926701
    Version:
    Available read size: 1536
Listing 2.2 Example of PopEF’s Metadata content (Anchors section is omitted).
$ popef_dump -m executable.popef

PopEF file: executable.popef
Metadata:
    Version:
    IpuVersion: 2
    Executable: 3284579348837926701
    Replication Factor: 1
    NumIpus: 1
    SeedHandle:
    IsInference: True
    IsPOD: False
    NumIpus: 1
    NumProcesses: 1
  Engine Options:
    debug.retainDebugInformation: true
    exchange.enablePrefetch: true
    exchange.streamBufferOverlap: hostRearrangeOnly
    target.deterministicWorkers: false
  Program Flow:
    load: [0, 1]
    main: [5]
    save: [7]
  Programs Map:
    0: WeightsFromHost
    1: OptimizerFromHost
    2: RandomSeedFromHost
    3: RandomSeedToHost
    4: RngStateFromHost
    5: Program
    6: RngStateToHost
    7: WeightsToHost
    8: CycleCountTensorToHost
    9: CustomProgramsStart
  Device Options:  <empty>