5.12. Manual sharding

PopRT manual sharding supports dividing the model into different subgraphs through sharding points provided by users to achieve model parallelism and pipeline parallelism.

5.12.1. Sharding and model parallelism

PopRT supports sharding the ONNX graph across different devices based on the sharding points provided by users to achieve model parallelism. Sharding is suitable for large models that exceed the memory limit of a single device and require multiple devices.

For more information, refer to the section on sharding in the technical note.

Note

To use model parallelism, the PopRT backend options need to be set as follows:

options.virtual_graph_mode = “manual”
options.num_ipus = number of devices

5.12.2. Pipelining and pipeline parallelism

PopRT supports sharding the ONNX graph across different pipeline stages based on the sharding points provided by users to achieve pipeline parallelism and improve throughput.

For more information, refer to the sections on pipelining in the technical note and in the IPU Programmer’s Guide.

Note

To use pipeline parallelism, it is necessary to enable model parallelism and set the PopRT backend options as follows:

options.enable_pipelining = True
options.batches_per_step = integer multiple of the number of pipeline stages

5.12.3. Manual sharding process

PopRT manual sharding shards the ONNX graph based on the ONNX node, and the sharding point can be any ONNX node.

The nodes in the ONNX graph are arranged in topological sorting order. PopRT manual sharding first performs topological sorting of the sharding points set by the user.
Traverse the sharding point. Take the sharding point as the starting point to traverse the ONNX graph in the direction of input, and put all the traversed ONNX nodes into a subgraph. If there is no input node or the node has already set sharding information, then stop the traversal of such branch.
After the traversal is completed, you will get the subgraph. Set the sharding information of the subgraph using ONNX attribute:
- __ipu_number specifies the device serial number corresponding to each subgraph in model parallelism
- __pipeline_stage specifies the pipeline stage corresponding to each subgraph in pipeline parallelism.

Note

Different sharding points can have the same device serial number and pipeline stage. For example, if there are two parallel branches started from different sharding points, and we want to put them onto a single device, then these two sharding points will have same device serial number.
After the sharding information is set based on the sharding point, the remaining nodes without sharding information are automatically set:
- __ipu_number will be set to the currently set maximum device serial number +1.
- __pipeline_stage will be set to the currently set maximum pipeline stage +1.

5.12.4. Configuring manual sharding

There are two methods for configuring manual sharding:

with the PopRT CLI
with the poprt.converter.Sharder class.

Configuring manual sharding with the PopRT CLI

Specify the sharding point name, device serial number and pipeline stage with the yaml file:

Listing 5.17 shard.yaml

-
  node: resnetv17_stage1__plus0
  device: 0
  stage: 0
-
  node: resnetv17_stage4_batchnorm2_fwd
  device: 1
  stage: 1
-
  node: resnetv17_stage4__plus0
  device: 2
  stage: 2

Download shard.yaml

Configuring sharding information with --manual_sharding_config in the PopRT CLI:

poprt \
    --input_model model.onnx \
    --manual_sharding_config shard.yaml

Determine whether to perform manual sharding only on input_model with --only_manual_sharding in the PopRT CLI, which is not set by default.

Not setting --only_manual_sharding means that manual sharding is performed after the Convert phase optimisation on input_model.

Setting --only_manual_sharding means that only manual sharding is performed on input_model. Only --input_model, --output_model, --output_dir and --manual_sharding_config are supported; other parameters are invalid.

poprt \
    --input_model model.onnx \
    --manual_sharding_config shard.yaml \
    --only_manual_sharding

Configuring manual sharding with the Python API

You can use poprt.converter.Sharder to configure manual sharding.

sharding_info = {
    "resnetv17_stage1__plus0": 0,
    "resnetv17_stage4_batchnorm2_fwd": 1,
    "resnetv17_stage4__plus0: 2,
}
pipelining_info = {
    "resnetv17_stage1__plus0": 0,
    "resnetv17_stage4_batchnorm2_fwd": 1,
    "resnetv17_stage4__plus0: 2,
}

sharded_model = poprt.converter.Sharder(
                            sharding_info=sharding_info,
                            pipelining_info=pipelining_info
                        ).run(converted_model)

Note

The PopRT CLI with --only_manual_sharding set or the use of poprt.converter.Sharder API needs to guarantee that every node in the ONNX graph has unique name.
The PopRT CLI without --only_manual_sharding set does not need to guarantee that every node in the ONNX graph has unique name. The Convert optimisation process will guarantee that every node has unique name.

5.12.5. Example

The following is a simple example of manual sharding:

Take ResNet50 as an example.

Listing 5.18 shard.py

# Copyright (c) 2023 Graphcore Ltd. All rights reserved.
import numpy as np
import onnx
import requests

from poprt import runtime
from poprt.compiler import Compiler, CompilerOptions
from poprt.converter import Sharder


def load_model():
    # Download model
    url = 'https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet50-v1-7.onnx'
    response = requests.get(url)
    if response.status_code == 200:
        model = onnx.load_model_from_string(response.content)
    else:
        raise Exception(
            f"Failed to download model with status_code {response.status_code}"
        )
    return model


def manual_sharding(model):
    # Fix the batch size to 1
    model.graph.input[0].type.tensor_type.shape.dim[0].dim_value = 1

    # Sharding and pipelining info
    sharding_info = {
        "resnetv17_stage1__plus0": 0,
        "resnetv17_stage4_batchnorm2_fwd": 1,
        "resnetv17_stage4__plus0": 2,
    }
    pipelining_info = {
        "resnetv17_stage1__plus0": 0,
        "resnetv17_stage4_batchnorm2_fwd": 1,
        "resnetv17_stage4__plus0": 2,
    }
    model = Sharder(sharding_info=sharding_info, pipelining_info=pipelining_info).run(
        model
    )

    return model


def compile(model):
    # Compile the model with backend options
    model_bytes = model.SerializeToString()
    outputs = [o.name for o in model.graph.output]

    options = CompilerOptions()
    options.ipu_version = runtime.DeviceManager().ipu_hardware_version()
    # Sharding into 4 IPUs
    options.num_ipus = 4
    # Enable Sharding and Pipelining
    options.enable_pipelining = True
    options.virtual_graph_mode = "manual"
    options.batches_per_step = 16

    executable = Compiler.compile(model_bytes, outputs, options)
    runner_config = runtime.RuntimeConfig()
    runner_config.timeout_ns = 0
    runner = runtime.Runner(executable, runner_config)
    return runner


def run(runner):
    inputs_info = runner.get_execute_inputs()
    outputs_info = runner.get_execute_outputs()

    inputs = {}
    for i in inputs_info:
        inputs[i.name] = np.ones(i.shape, dtype=i.numpy_data_type())

    outputs = {}
    for o in outputs_info:
        outputs[o.name] = np.zeros(o.shape, dtype=o.numpy_data_type())

    runner.execute(inputs, outputs)


if __name__ == '__main__':
    model = load_model()
    model = manual_sharding(model)
    runner = compile(model)
    run(runner)

Download shard.py

Search help

5.12. Manual sharding

5.12.1. Sharding and model parallelism

5.12.2. Pipelining and pipeline parallelism

5.12.3. Manual sharding process

5.12.4. Configuring manual sharding

Configuring manual sharding with the PopRT CLI

Configuring manual sharding with the Python API

5.12.5. Example