4. Command line interface

poprt is a tool to help quickly deploy ONNX models on IPUs.

usage: poprt
       [-h]
       [--available_memory_proportion AVAILABLE_MEMORY_PROPORTION]
       [--batch_size BATCH_SIZE]
       [--batch_axis BATCH_AXIS]
       [--batches_per_step BATCHES_PER_STEP]
       [--calibration_with_data]
       [--calibration_loss_type {mse,mae,snr,kld,cos_dist,gptq}]
       [--check]
       [--checkpoints CHECKPOINTS]
       [--compiler_options KEY=VAL [KEY=VAL ...]]
       [--config_yaml CONFIG_YAML]
       [--convert_version CONVERT_VERSION]
       [--custom_library_so_paths CUSTOM_LIBRARY_SO_PATHS [CUSTOM_LIBRARY_SO_PATHS ...]]
       [--custom_pass_config CUSTOM_PASS_CONFIG]
       [--custom_shape_inference CUSTOM_SHAPE_INFERENCE]
       [--data_preprocess DATA_PREPROCESS]
       [--disable_compilation_progress_bar]
       [--disable_fast_norm]
       [--eightbitsio]
       [--merge_if_with_same_cond]
       [--enable_compress_pattern]
       [--enable_erf_gelu]
       [--enable_insert_remap]
       [--export_popef]
       [--fold_periodic_initializer]
       [--fp16_skip_op_types FP16_SKIP_OP_TYPES]
       [--enable_avoid_overflow_patterns]
       [--fp8_skip_op_names FP8_SKIP_OP_NAMES]
       [--fp8_params FP8_PARAMS]
       [--framework FRAMEWORK]
       [--infer_shape_ahead]
       [-i INPUT_MODEL]
       [--input_shape INPUT_TENSOR_NAME = INPUT_SHAPE [INPUT_TENSOR_NAME = INPUT_SHAPE ...]]
       [--ipu_version {ipu2,ipu21}]
       [--list_all_passes]
       [--logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
       [--manual_sharding_config MANUAL_SHARDING_CONFIG]
       [--max_tensor_size MAX_TENSOR_SIZE]
       [--num_io_tiles NUM_IO_TILES]
       [--num_of_layers_keep_fp16 NUM_OF_LAYERS_KEEP_FP16]
       [--only_manual_sharding]
       [--optimize_internal_exchange_code]
       [--output_dir OUTPUT_DIR]
       [--output_model OUTPUT_MODEL]
       [--pack_args KEY=VAL [KEY=VAL ...]]
       [--passes PASSES]
       [--perf_tuner]
       [--popart_options KEY=VAL [KEY=VAL ...]]
       [--precision {fp32,fp16,fp8,fp8_weight}]
       [--precision_compare]
       [--print_completion {bash,zsh}]
       [--remap_mode REMAP_MODE]
       [--remove_outputs REMOVE_OUTPUTS]
       [--run]
       [--serialize_matmul KEY=VAL [KEY=VAL ...]]
       [--serialize_matmul_add KEY=VAL [KEY=VAL ...]]
       [--merge_matmul MERGE_MATMUL]
       [--merge_matmul_add MERGE_MATMUL_ADD]
       [--merge_moe MERGE_MOE]
       [--show]
       [--skip_passes SKIP_PASSES]
       [-v]
       {tf2onnx}
       ...

4.1. Named Arguments

--available_memory_proportion

Set the available memory proportion for MatMul, Conv and Gemm Ops. Range (0, 1]. Default None.

--batch_size

Set the batch size for all inputs. Works with the batch_axis parameter.

--batch_axis

Specify the batch axis for all inputs. Works with the batch_size parameter.

--batches_per_step

Set the number of mini-batches to perform on the device before returning to the host. Default: 1.

Default: 1

--calibration_with_data

Calibrate the FP8 model using the calibration data. Note that this option only applies when precisionis set to fp8 or fp8_weight.

Default: False

--calibration_loss_type

Possible choices: mse, mae, snr, kld, cos_dist, gptq

Choose the calibration method, note that gptq can only be used for calibration of fp8_weight. Default is kld.

Default: “kld”

--check

Use made-up data to check that the model runs.

Default: False

--checkpoints

Add intermediate tensor into outputs of graph in order to debug the precision. Default None.

--compiler_options

Set PopRT Compiler Options.

--config_yaml

Set the path of the yaml config file. Default None.

--convert_version

Convert the opset version of ONNX model to CONVERT_VERSION. Default 11.

Default: 11

--custom_library_so_paths

Paths of the custom shared library with custom ops/patterns/transforms.

--custom_pass_config

Path of the custom pass config file.

--custom_shape_inference

Paths of the custom shape inference scripts.For example: –custom_shape_inference “./custom_shape_inference_1.py,../ops/custom_shape_inference_2.py”.

--data_preprocess

Path of pickle format file for data preprocessing.

--disable_compilation_progress_bar

Do not show compilation progress bar.

Default: False

--disable_fast_norm

Do not transfer layer_norm Ops to fast_norm Ops.

Default: False

--eightbitsio

Enable 8-bit input/output.

Default: False

--merge_if_with_same_cond

enable merge of if Ops that use the same conditional input.

Default: False

--enable_compress_pattern

Enable replace Compress patterns with MaskCompress Ops.

Default: False

--enable_erf_gelu

Enable replace Erf Gelu patterns with Gelu op.

Default: False

--enable_insert_remap

Enable insert remap automatically to improve tensor layout.

Default: False

--export_popef

Enable the generation of PopEF model files in the conversion process.

Default: False

--fold_periodic_initializer

Fold periodic initializer to save Always-Live memory.

Default: False

--fp16_skip_op_types

Set the list of op types which will keep float32 operands in float16 mode. Default None.

--enable_avoid_overflow_patterns

Enable to keep float32 for several specific patterns in a float16 model.

Default: False

--fp8_skip_op_names

The names of ops which will remain as float32 or float16 in fp8 mode. For example, “Conv_1, Conv_2”. Default None

--fp8_params

Set parameters to fp8 model, the format is “input_format,weight_format,input_scale,weight_scale”

Default: “F143,F143,-1,-1”

--framework

Specify frontend to load input model.

Default: “onnx”

--infer_shape_ahead

Fix input shape and infer shapes at beginning.

Default: False

-i, --input_model

Set the path of the original ONNX model.

--input_shape

Set the input shape of the model. If the model input is variable, we recommend setting the model input shape. For example: –input_shape input_ids=1,512 attention_mask=1,512.

--ipu_version

Possible choices: ipu2, ipu21

Set the IPU version: use ipu21 for C600 systems and ipu2 for IPU-M2000 and Bow-2000 systems. Default ipu2.

Default: “ipu2”

--list_all_passes

List all passes. Refer to –passes.

Default: False

--logging_level

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL

Set the logging level. Default WARNING.

Default: “WARNING”

--manual_sharding_config

Set the path of the yaml config file of sharding and pipelining. Default None.

--max_tensor_size

Set max tensor size(bytes) generated by constant_folding. -1 means do not set max_tensor_size by default.For example: –max_tensor_size 41943040 means constant_folding can only generate tensors smaller than 40MB.

Default: -1

--num_io_tiles

Set the number of IPU tiles dedicated to IO. Default 0. IPU run in OverlapIO mode if this number > 0. For more information about OverlapIO see the PopART user guide: https://docs.graphcore.ai/projects/popart-user-guide/en/latest/overlap_io.html.

Default: 0

--num_of_layers_keep_fp16

Set the layer whose loss is topk to fp16 in fp8 quantization.

Default: 0

--only_manual_sharding

Only shard the graph in the cli. If enable only_manual_sharding, the cli only supports –input_model, –output_model, –output_dir and –manual_sharding_config. –output_model and `–output_dir are optional.

Default: False

--optimize_internal_exchange_code

Enable to optimize the memory usage of internal exchange code.

Default: False

--output_dir

Set the output directory where the converted model files and PopEF files are saved. Default current directory.

Default: “./”

--output_model

Set the name of the converted ONNX model. This will be placed in the –output_dir directory.

--pack_args

Set the pack args, for example: –pack_args max_valid_num=50 enable_double_batch_unpack=false segment_max_size=13+51

--passes

Set the passes to be used during conversion. Default None. For example: –passes “pre_scale,fuse_attention”. Refer to –list_all_passes which is able to show all available passes.

--perf_tuner

Enable the performance tuner, unimplemented now.

Default: False

--popart_options

Set PopART Session Options. For more information: https://docs.graphcore.ai/projects/popart-python-api/en/latest/api-python.html?highlight=POPART#session-options.

--precision

Possible choices: fp32, fp16, fp8, fp8_weight

Quantize the model to the specfied precision. Default fp32.

Default: “fp32”

--precision_compare

Compare the output precision of conv/matmul/gemm between the origin and the converted model,note that it only take effect when precision is set to fp8 or fp8_weight

Default: False

--print_completion

Possible choices: bash, zsh

print shell completion script.

--remap_mode

Set the insert position of remap, valid only if enable_insert_remap is set.Must be templated with ‘before/after’+’_’+’op_type’(such as after_matmul,before_softmax,after_concat).

Default: “after_matmul”

--remove_outputs

Remove the specific outputs and useless structures from the graph.

--run

Run PopEF with random data.

Default: False

--serialize_matmul

Enable to serialize MatMul op to save memory on chip. –serialize_matmul ${OP_NAME}=${FACTOR}/${MODE}/${KEEP_PRECISION} or –serialize_matmul ${OP_NAME}=${FACTOR}/${MODE} or –serialize_matmul ${OP_NAME}=${FACTOR}. ${MODE} choices [ input_channels, output_channels, reducing_dim, none ]. Default is output_channels. ${KEEP_PRECISION} choices [ True, False ]. Default is False. For example, –serialize_matmul MatMul_1=4/input_channels/True MatMul_2=4/input_channels MatMul_3=4

--serialize_matmul_add

Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip. –serialize_matmul_add ${MATMUL_OP_NAME}/${ADD_OP_NAME}=${FACTOR} For example, –serialize_matmul_add MatMul_1/Add_2=4

--merge_matmul

Enable to merge MatMul operations to save cycles. –merge_matmul ${MATMUL_OP_NAME1},${MATMUL_OP_NAME2} For example, –merge_matmul MatMul_1,MatMul_2

--merge_matmul_add

Enable to merge MatMul/Add operations to save cycles. –merge_matmul_add ${MATMUL_OP_NAME1},${ADD_OP_NAME1},${MATMUL_OP_NAME2},${ADD_OP_NAME2} For example, –merge_matmul_add MatMul_1,Add_1,MatMul_2,Add_2

--merge_moe

Enable to merge Mixture-of-Experts structure to save cycles. –merge_moe ${EXPERT_BEGIN_OP_NAME1},${EXPERT_END_OP_NAME1},${EXPERT_BEGIN_OP_NAME2},${EXPERT_END_OP_NAME2} For example, –merge_moe MatMul_1,Add_1,MatMul_2,Add_2

--show

Show the input and output information of the model.

Default: False

--skip_passes

Set the list of passes that will be skipped. Default None.

-v, --version

Version of the output tool.

Default: False

4.2. Sub-commands

4.2.1. tf2onnx

convert tensorflow model to onnx.

poprt tf2onnx [-h] [--saved_model] [--signature_def SIGNATURE_DEF] [--tag TAG]
              [--inputs INPUTS] [--outputs OUTPUTS] [--opset OPSET]
              [--inputs_as_nchw INPUTS_AS_NCHW]
              [--outputs_as_nchw OUTPUTS_AS_NCHW]

Named Arguments

--saved_model

Specify if is saved_model. Use input_model to specify model path.

Default: False

--signature_def

signature_def from saved_model to use.

--tag

tag to use for saved_model.

--inputs

model input_names (optional for saved_model).

--outputs

model output_names (optional for saved_model).

--opset

opset version to use for onnx domain in tf frontend.

Default: 11

--inputs_as_nchw

transpose inputs as from nhwc to nchw.

--outputs_as_nchw

transpose outputs as from nhwc to nchw.