9. gc-hosttraffictest

This tool tests the data transfer between the host machine and the IPUs (in both directions).

On IPU-M2000 systems, host transfers to and from the IPU are remotely buffered. The simultaneous access to that buffer is measured.

On PCIe card systems, the IPU has direct access to the buffer in the host memory. In this case, the performance of memory accesses to the local buffer is not measured.

To use it, run:

gc-hosttraffictest -d {device_id} -j

where {device_id} is the id number returned by the gc-inventory tool.

If JSON output is selected, the output on an IPU-M2000 system will look something like this:

{
    "configuration": {
        "number_of_ipus": "1",
        "tile_transfers_enabled": "true",
        "tiles_per_ipu": "32",
        "transfer_size_64byte_blocks": "4",
        "iterations": "100000",
        "host_transfers_enabled": "true"
    },
    "repeat_0": {
        "host_from_buffer_from_tile": {
            "description": "host <- buffer <- tile",
            "duration_seconds": "3.4895363750000001",
            "host_data_transferred_bytes": "10234101760",
            "tile_data_transferred_bytes": "13107200000",
            "host_transfer_speed_gbps": "23.462375880807375",
            "tile_transfer_speed_gbps": "30.049149437509445"
        },
        "host_to_buffer_to_tile": {
            "description": "host -> buffer -> tile",
            "duration_seconds": "3.970404303",
            "host_data_transferred_bytes": "28034727936",
            "tile_data_transferred_bytes": "13107200000",
            "host_transfer_speed_gbps": "56.487401879586365",
            "tile_transfer_speed_gbps": "26.409804140291353"
        },
        "host_to_buffer_from_tile": {
            "description": "host -> buffer <- tile",
            "duration_seconds": "4.2574377639999996",
            "host_data_transferred_bytes": "11576279040",
            "tile_data_transferred_bytes": "13107200000",
            "host_transfer_speed_gbps": "21.752574542156008",
            "tile_transfer_speed_gbps": "24.629273711680266"
        },
        "host_from_buffer_to_tile": {
            "description": "host <- buffer -> tile",
            "duration_seconds": "3.8982566639999998",
            "host_data_transferred_bytes": "19612565504",
            "tile_data_transferred_bytes": "13107200000",
            "host_transfer_speed_gbps": "40.248895225642841",
            "tile_transfer_speed_gbps": "26.89858801970357"
        }
    }
}

The output will be plain text if the -j option is not specified. For example:

$ gc-hosttraffictest --device 0
Running test sequence 1/1
host <- buffer <- tile:  23.46 Gbps RDMA host read, 30.00 Gbps tile write, for 3.50 seconds.
host -> buffer -> tile:  55.78 Gbps RDMA host write, 26.40 Gbps tile read, for 3.97 seconds.
host -> buffer <- tile:  21.75 Gbps RDMA host write, 24.63 Gbps tile write, for 4.26 seconds.
host <- buffer -> tile:  40.38 Gbps RDMA host read, 26.97 Gbps tile read, for 3.89 seconds.

There are options to measure just IPU or host transfer performance independently.

For example:

$ gc-hosttraffictest --device 0 --tile
Running test sequence 1/1
buffer <- tile:  37.38 Gbps tile write, for 2.80 seconds.
buffer -> tile:  51.92 Gbps tile read, for 2.02 seconds.

$ gc-hosttraffictest --device 0 --host
Running test sequence 1/1
host -> buffer:  90.72 Gbps RDMA host write, for 5.01 seconds.
host <- buffer:  74.85 Gbps RDMA host read, for 5.00 seconds.

More detailed information and a progress bar is shown when using the -v option.

If an error occurs during the test, then gc-hosttraffictest will return a non-zero exit code, and output an error message to the terminal. When JSON output is enabled, an error field will be added to the failing test.

9.1. Usage

9.1.1. Allowed options

-j, --json-output

Emit JSON output

-d {arg}, --device-id {arg}

Device id

-n {arg}, --num-tiles {arg}

Number of tiles: 1 to 32 (default: 32)

-r, --remote-buffers

Use remote buffers (HEXOATT, without use HEXOPT)

-p {arg}, --payload-blocks {arg}

Number of 64 byte blocks per transfer: 2|4 (default: 4)

-i {arg}, --iterations {arg}

Number of 4KB transfers per tile (affects test duration) (default: 100000)

--test-duration {arg}

Test duration in seconds for tests without IPU tile access (default: 5)

--tile-read

Measure IPU tile read performance

--tile-write

Measure IPU tile write performance

--tile

Alternative for –tile-read –tile-write

--host-read

Measure host RDMA read performance

--host-write

Measure host RDMA write performance

--host

Alternative for –host-read –host-write

--min-tile-bandwidth {arg}

Minimum tile bandwidth (Gbps) expected - fails if not reached (default: 0)

--min-host-bandwidth {arg}

Minimum bandwidth (Gbps) expected - fails if not reached (default: 0)

--no-timeout

Suppress test timeout

--suppress-failure-reset

Suppress safety reset on failure (for debug)

-v, --verbose

Verbose output

--use-secondary

Use C600 secondary PCIe interface

--dump-tile-overview

Dump tile overview upon failure

-h, --help

Produce help message

--version

Version number

-R {arg}, --repeat {arg}

Number of times to repeat test (default: 1)