4. Command line interface

poprt is a tool to help quickly deploy ONNX models on IPUs.

usage: poprt
       [--available_memory_proportion AVAILABLE_MEMORY_PROPORTION]
       [--batch_size BATCH_SIZE]
       [--batch_axis BATCH_AXIS]
       [--batches_per_step BATCHES_PER_STEP]
       [--calibration_loss_type {mse,mae,snr,kld,cos_dist,gptq}]
       [--checkpoints CHECKPOINTS]
       [--compiler_options KEY=VAL [KEY=VAL ...]]
       [--config_yaml CONFIG_YAML]
       [--convert_version CONVERT_VERSION]
       [--custom_library_so_paths CUSTOM_LIBRARY_SO_PATHS [CUSTOM_LIBRARY_SO_PATHS ...]]
       [--custom_pass_config CUSTOM_PASS_CONFIG]
       [--custom_shape_inference CUSTOM_SHAPE_INFERENCE]
       [--data_preprocess DATA_PREPROCESS]
       [--fp16_skip_op_types FP16_SKIP_OP_TYPES]
       [--fp8_skip_op_names FP8_SKIP_OP_NAMES]
       [--fp8_params FP8_PARAMS]
       [--framework FRAMEWORK]
       [-i INPUT_MODEL]
       [--ipu_version {ipu2,ipu21}]
       [--logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
       [--manual_sharding_config MANUAL_SHARDING_CONFIG]
       [--max_tensor_size MAX_TENSOR_SIZE]
       [--num_io_tiles NUM_IO_TILES]
       [--num_of_layers_keep_fp16 NUM_OF_LAYERS_KEEP_FP16]
       [--output_dir OUTPUT_DIR]
       [--output_model OUTPUT_MODEL]
       [--pack_args KEY=VAL [KEY=VAL ...]]
       [--passes PASSES]
       [--popart_options KEY=VAL [KEY=VAL ...]]
       [--precision {fp32,fp16,fp8,fp8_weight}]
       [--print_completion {bash,zsh}]
       [--remap_mode REMAP_MODE]
       [--remove_outputs REMOVE_OUTPUTS]
       [--serialize_matmul KEY=VAL [KEY=VAL ...]]
       [--serialize_matmul_add KEY=VAL [KEY=VAL ...]]
       [--merge_matmul MERGE_MATMUL]
       [--merge_matmul_add MERGE_MATMUL_ADD]
       [--merge_moe MERGE_MOE]
       [--skip_passes SKIP_PASSES]

4.1. Named Arguments


Set the available memory proportion for MatMul, Conv and Gemm Ops. Range (0, 1]. Default None.


Set the batch size for all inputs. Works with the batch_axis parameter.


Specify the batch axis for all inputs. Works with the batch_size parameter.


Set the number of mini-batches to perform on the device before returning to the host. Default: 1.

Default: 1


Calibrate the FP8 model using the calibration data. Note that this option only applies when precisionis set to fp8 or fp8_weight.

Default: False


Possible choices: mse, mae, snr, kld, cos_dist, gptq

Choose the calibration method, note that gptq can only be used for calibration of fp8_weight. Default is kld.

Default: “kld”


Use made-up data to check that the model runs.

Default: False


Add intermediate tensor into outputs of graph in order to debug the precision. Default None.


Set PopRT Compiler Options.


Set the path of the yaml config file. Default None.


Convert the opset version of ONNX model to CONVERT_VERSION. Default 11.

Default: 11


Paths of the custom shared library with custom ops/patterns/transforms.


Path of the custom pass config file.


Paths of the custom shape inference scripts.For example: –custom_shape_inference “./custom_shape_inference_1.py,../ops/custom_shape_inference_2.py”.


Path of pickle format file for data preprocessing.


Do not show compilation progress bar.

Default: False


Do not transfer layer_norm Ops to fast_norm Ops.

Default: False


Enable 8-bit input/output.

Default: False


enable merge of if Ops that use the same conditional input.

Default: False


Enable replace Compress patterns with MaskCompress Ops.

Default: False


Enable replace Erf Gelu patterns with Gelu op.

Default: False


Enable insert remap automatically to improve tensor layout.

Default: False


Enable the generation of PopEF model files in the conversion process.

Default: False


Fold periodic initializer to save Always-Live memory.

Default: False


Set the list of op types which will keep float32 operands in float16 mode. Default None.


Enable to keep float32 for several specific patterns in a float16 model.

Default: False


The names of ops which will remain as float32 or float16 in fp8 mode. For example, “Conv_1, Conv_2”. Default None


Set parameters to fp8 model, the format is “input_format,weight_format,input_scale,weight_scale”

Default: “F143,F143,-1,-1”


Specify frontend to load input model.

Default: “onnx”


Fix input shape and infer shapes at beginning.

Default: False

-i, --input_model

Set the path of the original ONNX model.


Set the input shape of the model. If the model input is variable, we recommend setting the model input shape. For example: –input_shape input_ids=1,512 attention_mask=1,512.


Possible choices: ipu2, ipu21

Set the IPU version: use ipu21 for C600 systems and ipu2 for IPU-M2000 and Bow-2000 systems. Default ipu2.

Default: “ipu2”


List all passes. Refer to –passes.

Default: False



Set the logging level. Default WARNING.

Default: “WARNING”


Set the path of the yaml config file of sharding and pipelining. Default None.


Set max tensor size(bytes) generated by constant_folding. -1 means do not set max_tensor_size by default.For example: –max_tensor_size 41943040 means constant_folding can only generate tensors smaller than 40MB.

Default: -1


Set the number of IPU tiles dedicated to IO. Default 0. IPU run in OverlapIO mode if this number > 0. For more information about OverlapIO see the PopART user guide: https://docs.graphcore.ai/projects/popart-user-guide/en/latest/overlap_io.html.

Default: 0


Set the layer whose loss is topk to fp16 in fp8 quantization.

Default: 0


Only shard the graph in the cli. If enable only_manual_sharding, the cli only supports –input_model, –output_model, –output_dir and –manual_sharding_config. –output_model and `–output_dir are optional.

Default: False


Enable to optimize the memory usage of internal exchange code.

Default: False


Set the output directory where the converted model files and PopEF files are saved. Default current directory.

Default: “./”


Set the name of the converted ONNX model. This will be placed in the –output_dir directory.


Set the pack args, for example: –pack_args max_valid_num=50 enable_double_batch_unpack=false segment_max_size=13+51


Set the passes to be used during conversion. Default None. For example: –passes “pre_scale,fuse_attention”. Refer to –list_all_passes which is able to show all available passes.


Enable the performance tuner, unimplemented now.

Default: False


Set PopART Session Options. For more information: https://docs.graphcore.ai/projects/popart-python-api/en/latest/api-python.html?highlight=POPART#session-options.


Possible choices: fp32, fp16, fp8, fp8_weight

Quantize the model to the specfied precision. Default fp32.

Default: “fp32”


Compare the output precision of conv/matmul/gemm between the origin and the converted model,note that it only take effect when precision is set to fp8 or fp8_weight

Default: False


Possible choices: bash, zsh

print shell completion script.


Set the insert position of remap, valid only if enable_insert_remap is set.Must be templated with ‘before/after’+’_’+’op_type’(such as after_matmul,before_softmax,after_concat).

Default: “after_matmul”


Remove the specific outputs and useless structures from the graph.


Run PopEF with random data.

Default: False


Enable to serialize MatMul op to save memory on chip. –serialize_matmul ${OP_NAME}=${FACTOR}/${MODE}/${KEEP_PRECISION} or –serialize_matmul ${OP_NAME}=${FACTOR}/${MODE} or –serialize_matmul ${OP_NAME}=${FACTOR}. ${MODE} choices [ input_channels, output_channels, reducing_dim, none ]. Default is output_channels. ${KEEP_PRECISION} choices [ True, False ]. Default is False. For example, –serialize_matmul MatMul_1=4/input_channels/True MatMul_2=4/input_channels MatMul_3=4


Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip. –serialize_matmul_add ${MATMUL_OP_NAME}/${ADD_OP_NAME}=${FACTOR} For example, –serialize_matmul_add MatMul_1/Add_2=4


Enable to merge MatMul operations to save cycles. –merge_matmul ${MATMUL_OP_NAME1},${MATMUL_OP_NAME2} For example, –merge_matmul MatMul_1,MatMul_2


Enable to merge MatMul/Add operations to save cycles. –merge_matmul_add ${MATMUL_OP_NAME1},${ADD_OP_NAME1},${MATMUL_OP_NAME2},${ADD_OP_NAME2} For example, –merge_matmul_add MatMul_1,Add_1,MatMul_2,Add_2


Enable to merge Mixture-of-Experts structure to save cycles. –merge_moe ${EXPERT_BEGIN_OP_NAME1},${EXPERT_END_OP_NAME1},${EXPERT_BEGIN_OP_NAME2},${EXPERT_END_OP_NAME2} For example, –merge_moe MatMul_1,Add_1,MatMul_2,Add_2


Show the input and output information of the model.

Default: False


Set the list of passes that will be skipped. Default None.

-v, --version

Version of the output tool.

Default: False

4.2. Sub-commands

4.2.1. tf2onnx

convert tensorflow model to onnx.

poprt tf2onnx [-h] [--saved_model] [--signature_def SIGNATURE_DEF] [--tag TAG]
              [--inputs INPUTS] [--outputs OUTPUTS] [--opset OPSET]
              [--inputs_as_nchw INPUTS_AS_NCHW]
              [--outputs_as_nchw OUTPUTS_AS_NCHW]

Named Arguments


Specify if is saved_model. Use input_model to specify model path.

Default: False


signature_def from saved_model to use.


tag to use for saved_model.


model input_names (optional for saved_model).


model output_names (optional for saved_model).


opset version to use for onnx domain in tf frontend.

Default: 11


transpose inputs as from nhwc to nchw.


transpose outputs as from nhwc to nchw.