6. gc-flops
This tool allows you to benchmark the number of floating point operations per second (FLOPS) on one or more IPU processors. The tool supports Mk2 architectures only.
For example:
$ gc-flops -d 0 -p FP32
This test can take up to 3 minutes to run.
Loading binary into IPU(s)
Starting FP32 FLOPS calculation...
Flops per IPU (FP32): 87.1078 teraFLOPS (took: 27.8703 seconds)
IPU clock speed: 1850 MHz
6.1. Precision
The --fp
and -p
command options select the floating point precision.
You can choose between FP16
(the default), which is 16-bit floating point also known as
half-precision floating point, or FP32
for single-precision floating point.
6.2. Device
The --device-id
and -d
command options specify the IPU device to benchmark.
Note
The tool reports benchmark results in teraFLOPS. The clock speed of the IPU affects these results. The tool also measures and reports the clock speed. For a multi-IPU device, the tool reports the lowest value from all the IPUs.
6.3. Usage
6.3.1. Allowed options
|
Floating point precision [FP16|FP32|FP8] (default: FP16) |
|
Device id |
|
Emit JSON output |
|
Permit low clock-rate |
|
Produce help message |
|
Version number |