4. Command line interface
poprt is a tool to help quickly deploy ONNX models on IPUs.
usage: poprt
[-h]
[--available_memory_proportion AVAILABLE_MEMORY_PROPORTION]
[--batch_size BATCH_SIZE]
[--batch_axis BATCH_AXIS]
[--batches_per_step BATCHES_PER_STEP]
[--calibration_with_data]
[--calibration_loss_type {mse,mae,snr,kld,cos_dist,gptq}]
[--check]
[--checkpoints CHECKPOINTS]
[--compiler_options KEY=VAL [KEY=VAL ...]]
[--config_yaml CONFIG_YAML]
[--convert_version CONVERT_VERSION]
[--custom_library_so_paths CUSTOM_LIBRARY_SO_PATHS [CUSTOM_LIBRARY_SO_PATHS ...]]
[--custom_pass_config CUSTOM_PASS_CONFIG]
[--custom_shape_inference CUSTOM_SHAPE_INFERENCE]
[--data_preprocess DATA_PREPROCESS]
[--disable_compilation_progress_bar]
[--disable_fast_norm]
[--eightbitsio]
[--merge_if_with_same_cond]
[--enable_compress_pattern]
[--enable_erf_gelu]
[--enable_insert_remap]
[--export_popef]
[--fold_periodic_initializer]
[--fp16_skip_op_types FP16_SKIP_OP_TYPES]
[--enable_avoid_overflow_patterns]
[--fp8_skip_op_names FP8_SKIP_OP_NAMES]
[--fp8_params FP8_PARAMS]
[--framework FRAMEWORK]
[--infer_shape_ahead]
[-i INPUT_MODEL]
[--input_shape INPUT_TENSOR_NAME = INPUT_SHAPE [INPUT_TENSOR_NAME = INPUT_SHAPE ...]]
[--ipu_version {ipu2,ipu21}]
[--list_all_passes]
[--logging_level {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[--manual_sharding_config MANUAL_SHARDING_CONFIG]
[--max_tensor_size MAX_TENSOR_SIZE]
[--num_io_tiles NUM_IO_TILES]
[--num_of_layers_keep_fp16 NUM_OF_LAYERS_KEEP_FP16]
[--only_manual_sharding]
[--optimize_internal_exchange_code]
[--output_dir OUTPUT_DIR]
[--output_model OUTPUT_MODEL]
[--pack_args KEY=VAL [KEY=VAL ...]]
[--passes PASSES]
[--perf_tuner]
[--popart_options KEY=VAL [KEY=VAL ...]]
[--precision {fp32,fp16,fp8,fp8_weight}]
[--precision_compare]
[--print_completion {bash,zsh}]
[--remap_mode REMAP_MODE]
[--remove_outputs REMOVE_OUTPUTS]
[--run]
[--serialize_matmul KEY=VAL [KEY=VAL ...]]
[--serialize_matmul_add KEY=VAL [KEY=VAL ...]]
[--merge_matmul MERGE_MATMUL]
[--merge_matmul_add MERGE_MATMUL_ADD]
[--merge_moe MERGE_MOE]
[--show]
[--skip_passes SKIP_PASSES]
[-v]
{tf2onnx}
...
4.1. Named Arguments
- --available_memory_proportion
Set the available memory proportion for MatMul, Conv and Gemm Ops. Range (0, 1]. Default None.
- --batch_size
Set the batch size for all inputs. Works with the batch_axis parameter.
- --batch_axis
Specify the batch axis for all inputs. Works with the batch_size parameter.
- --batches_per_step
Set the number of mini-batches to perform on the device before returning to the host. Default: 1.
Default: 1
- --calibration_with_data
Calibrate the FP8 model using the calibration data. Note that this option only applies when precisionis set to fp8 or fp8_weight.
Default: False
- --calibration_loss_type
Possible choices: mse, mae, snr, kld, cos_dist, gptq
Choose the calibration method, note that gptq can only be used for calibration of fp8_weight. Default is kld.
Default: “kld”
- --check
Use made-up data to check that the model runs.
Default: False
- --checkpoints
Add intermediate tensor into outputs of graph in order to debug the precision. Default None.
- --compiler_options
Set PopRT Compiler Options.
- --config_yaml
Set the path of the yaml config file. Default None.
- --convert_version
Convert the opset version of ONNX model to CONVERT_VERSION. Default 11.
Default: 11
- --custom_library_so_paths
Paths of the custom shared library with custom ops/patterns/transforms.
- --custom_pass_config
Path of the custom pass config file.
- --custom_shape_inference
Paths of the custom shape inference scripts.For example: –custom_shape_inference “./custom_shape_inference_1.py,../ops/custom_shape_inference_2.py”.
- --data_preprocess
Path of pickle format file for data preprocessing.
- --disable_compilation_progress_bar
Do not show compilation progress bar.
Default: False
- --disable_fast_norm
Do not transfer layer_norm Ops to fast_norm Ops.
Default: False
- --eightbitsio
Enable 8-bit input/output.
Default: False
- --merge_if_with_same_cond
enable merge of if Ops that use the same conditional input.
Default: False
- --enable_compress_pattern
Enable replace Compress patterns with MaskCompress Ops.
Default: False
- --enable_erf_gelu
Enable replace Erf Gelu patterns with Gelu op.
Default: False
- --enable_insert_remap
Enable insert remap automatically to improve tensor layout.
Default: False
- --export_popef
Enable the generation of PopEF model files in the conversion process.
Default: False
- --fold_periodic_initializer
Fold periodic initializer to save Always-Live memory.
Default: False
- --fp16_skip_op_types
Set the list of op types which will keep float32 operands in float16 mode. Default None.
- --enable_avoid_overflow_patterns
Enable to keep float32 for several specific patterns in a float16 model.
Default: False
- --fp8_skip_op_names
The names of ops which will remain as float32 or float16 in fp8 mode. For example, “Conv_1, Conv_2”. Default None
- --fp8_params
Set parameters to fp8 model, the format is “input_format,weight_format,input_scale,weight_scale”
Default: “F143,F143,-1,-1”
- --framework
Specify frontend to load input model.
Default: “onnx”
- --infer_shape_ahead
Fix input shape and infer shapes at beginning.
Default: False
- -i, --input_model
Set the path of the original ONNX model.
- --input_shape
Set the input shape of the model. If the model input is variable, we recommend setting the model input shape. For example: –input_shape input_ids=1,512 attention_mask=1,512.
- --ipu_version
Possible choices: ipu2, ipu21
Set the IPU version: use ipu21 for C600 systems and ipu2 for IPU-M2000 and Bow-2000 systems. Default ipu2.
Default: “ipu2”
- --list_all_passes
List all passes. Refer to –passes.
Default: False
- --logging_level
Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL
Set the logging level. Default WARNING.
Default: “WARNING”
- --manual_sharding_config
Set the path of the yaml config file of sharding and pipelining. Default None.
- --max_tensor_size
Set max tensor size(bytes) generated by constant_folding. -1 means do not set max_tensor_size by default.For example: –max_tensor_size 41943040 means constant_folding can only generate tensors smaller than 40MB.
Default: -1
- --num_io_tiles
Set the number of IPU tiles dedicated to IO. Default 0. IPU run in OverlapIO mode if this number > 0. For more information about OverlapIO see the PopART user guide: https://docs.graphcore.ai/projects/popart-user-guide/en/latest/overlap_io.html.
Default: 0
- --num_of_layers_keep_fp16
Set the layer whose loss is topk to fp16 in fp8 quantization.
Default: 0
- --only_manual_sharding
Only shard the graph in the cli. If enable only_manual_sharding, the cli only supports –input_model, –output_model, –output_dir and –manual_sharding_config. –output_model and `–output_dir are optional.
Default: False
- --optimize_internal_exchange_code
Enable to optimize the memory usage of internal exchange code.
Default: False
- --output_dir
Set the output directory where the converted model files and PopEF files are saved. Default current directory.
Default: “./”
- --output_model
Set the name of the converted ONNX model. This will be placed in the –output_dir directory.
- --pack_args
Set the pack args, for example: –pack_args max_valid_num=50 enable_double_batch_unpack=false segment_max_size=13+51
- --passes
Set the passes to be used during conversion. Default None. For example: –passes “pre_scale,fuse_attention”. Refer to –list_all_passes which is able to show all available passes.
- --perf_tuner
Enable the performance tuner, unimplemented now.
Default: False
- --popart_options
Set PopART Session Options. For more information: https://docs.graphcore.ai/projects/popart-python-api/en/latest/api-python.html?highlight=POPART#session-options.
- --precision
Possible choices: fp32, fp16, fp8, fp8_weight
Quantize the model to the specfied precision. Default fp32.
Default: “fp32”
- --precision_compare
Compare the output precision of conv/matmul/gemm between the origin and the converted model,note that it only take effect when precision is set to fp8 or fp8_weight
Default: False
- --print_completion
Possible choices: bash, zsh
print shell completion script.
- --remap_mode
Set the insert position of remap, valid only if enable_insert_remap is set.Must be templated with ‘before/after’+’_’+’op_type’(such as after_matmul,before_softmax,after_concat).
Default: “after_matmul”
- --remove_outputs
Remove the specific outputs and useless structures from the graph.
- --run
Run PopEF with random data.
Default: False
- --serialize_matmul
Enable to serialize MatMul op to save memory on chip. –serialize_matmul ${OP_NAME}=${FACTOR}/${MODE}/${KEEP_PRECISION} or –serialize_matmul ${OP_NAME}=${FACTOR}/${MODE} or –serialize_matmul ${OP_NAME}=${FACTOR}. ${MODE} choices [ input_channels, output_channels, reducing_dim, none ]. Default is output_channels. ${KEEP_PRECISION} choices [ True, False ]. Default is False. For example, –serialize_matmul MatMul_1=4/input_channels/True MatMul_2=4/input_channels MatMul_3=4
- --serialize_matmul_add
Enable to serialize MatMul weights and Add bias with weights last dim to save memory on chip. –serialize_matmul_add ${MATMUL_OP_NAME}/${ADD_OP_NAME}=${FACTOR} For example, –serialize_matmul_add MatMul_1/Add_2=4
- --merge_matmul
Enable to merge MatMul operations to save cycles. –merge_matmul ${MATMUL_OP_NAME1},${MATMUL_OP_NAME2} For example, –merge_matmul MatMul_1,MatMul_2
- --merge_matmul_add
Enable to merge MatMul/Add operations to save cycles. –merge_matmul_add ${MATMUL_OP_NAME1},${ADD_OP_NAME1},${MATMUL_OP_NAME2},${ADD_OP_NAME2} For example, –merge_matmul_add MatMul_1,Add_1,MatMul_2,Add_2
- --merge_moe
Enable to merge Mixture-of-Experts structure to save cycles. –merge_moe ${EXPERT_BEGIN_OP_NAME1},${EXPERT_END_OP_NAME1},${EXPERT_BEGIN_OP_NAME2},${EXPERT_END_OP_NAME2} For example, –merge_moe MatMul_1,Add_1,MatMul_2,Add_2
- --show
Show the input and output information of the model.
Default: False
- --skip_passes
Set the list of passes that will be skipped. Default None.
- -v, --version
Version of the output tool.
Default: False
4.2. Sub-commands
4.2.1. tf2onnx
convert tensorflow model to onnx.
poprt tf2onnx [-h] [--saved_model] [--signature_def SIGNATURE_DEF] [--tag TAG]
[--inputs INPUTS] [--outputs OUTPUTS] [--opset OPSET]
[--inputs_as_nchw INPUTS_AS_NCHW]
[--outputs_as_nchw OUTPUTS_AS_NCHW]
Named Arguments
- --saved_model
Specify if is saved_model. Use input_model to specify model path.
Default: False
- --signature_def
signature_def from saved_model to use.
- --tag
tag to use for saved_model.
- --inputs
model input_names (optional for saved_model).
- --outputs
model output_names (optional for saved_model).
- --opset
opset version to use for onnx domain in tf frontend.
Default: 11
- --inputs_as_nchw
transpose inputs as from nhwc to nchw.
- --outputs_as_nchw
transpose outputs as from nhwc to nchw.