6. gc-flops

This tool allows you to benchmark the number of floating point operations per second (FLOPS) on one or more IPU processors. The tool supports Mk2 architectures only.

For example:

$ gc-flops -d 0 -p FP32
This test can take up to 3 minutes to run.
Loading binary into IPU(s)
Starting FP32 FLOPS calculation...
Flops per IPU (FP32): 87.1078 teraFLOPS (took: 27.8703 seconds)
IPU clock speed: 1850 MHz

6.1. Precision

The --fp and -p command options select the floating point precision. You can choose between FP16 (the default), which is 16-bit floating point also known as half-precision floating point, or FP32 for single-precision floating point.

6.2. Device

The --device-id and -d command options specify the IPU device to benchmark.

Note

The tool reports benchmark results in teraFLOPS. The clock speed of the IPU affects these results. The tool also measures and reports the clock speed. For a multi-IPU device, the tool reports the lowest value from all the IPUs.

6.3. Usage

6.3.1. Allowed options

-p {fp}, --fp {fp}

Floating point precision [FP16|FP32|FP8] (default: FP16)

-d {id}, --device-id {id}

Device id

-j, --json-output

Emit JSON output

-l, --low-clock-rate

Permit low clock-rate

-h, --help

Produce help message

-v, --version

Version number