Logo
Graphcore Command Line Tools
1.3.0
  • 1. Introduction
  • 2. Runtime options
  • 3. gc-docker
    • 3.1. Start a container with IPU devices
    • 3.2. Show Docker command
    • 3.3. Usage
      • 3.3.1. Commands
      • 3.3.2. Command options
      • 3.3.3. Examples
      • 3.3.4. Notes
  • 4. gc-exchangetest
    • 4.1. Usage
      • 4.1.1. Allowed options
  • 5. gc-exchangewritetest
    • 5.1. Usage
      • 5.1.1. Allowed options
  • 6. gc-flops
    • 6.1. Precision
    • 6.2. Device
    • 6.3. Usage
      • 6.3.1. Allowed options
  • 7. gc-gwlinkstraffictest
    • 7.1. Standard mode
    • 7.2. Single IPU mode
    • 7.3. Usage
      • 7.3.1. Allowed options
  • 8. gc-hostsynclatencytest
    • 8.1. Usage
      • 8.1.1. Allowed options
  • 9. gc-hosttraffictest
    • 9.1. Usage
      • 9.1.1. Allowed options
  • 10. gc-info
    • 10.1. Commands
      • 10.1.1. List devices
      • 10.1.2. Device info
      • 10.1.3. Tile overview
      • 10.1.4. Register dump
      • 10.1.5. Dump tile memory
    • 10.2. Usage
      • 10.2.1. Commands
      • 10.2.2. Command options
      • 10.2.3. Examples
    • 10.3. Glossary
  • 11. gc-inventory
    • 11.1. Usage
      • 11.1.1. Allowed options
  • 12. gc-iputraffictest
    • 12.1. Usage
      • 12.1.1. Allowed options
      • 12.1.2. Single ipu mode options
      • 12.1.3. Single ipu loopback mode
  • 13. gc-links
    • 13.1. Usage
      • 13.1.1. Allowed options
  • 14. gc-memorytest
    • 14.1. Usage
      • 14.1.1. Allowed options
  • 15. gc-monitor
    • 15.1. Output
      • 15.1.1. IPU-Machine device information
      • 15.1.2. PCIe card device information
      • 15.1.3. Process information
    • 15.2. Usage
      • 15.2.1. Allowed options
      • 15.2.2. Notes
      • 15.2.3. Examples
  • 16. gc-podman
  • 17. gc-powertest
    • 17.1. Usage
      • 17.1.1. Allowed options
  • 18. gc-reset
    • 18.1. Usage
      • 18.1.1. Allowed options
      • 18.1.2. Examples
      • 18.1.3. Notes
  • 19. gc-boardtool
    • 19.1. Thermal design power (TDP) limit
    • 19.2. Usage
      • 19.2.1. Available options
  • 20. Device attributes
    • 20.1. List of supported attributes
  • 21. C600 PCIe Device IDs and channel map
    • 21.1. Device IDs
    • 21.2. PCIe ID to slot mapping
    • 21.3. Manually preparing the mapping config File
  • 22. Trademarks & copyright
Graphcore Command Line Tools

Search help

Note: Searching from the top-level index page will search all documents. Searching from a specific document will search only that document.

  • Find an exact phrase: Wrap your search phrase in "" (double quotes) to only get results where the phrase is exactly matched. For example "PyTorch for the IPU" or "replicated tensor sharding"
  • Prefix query: Add an * (asterisk) at the end of any word to indicate a prefix query. This will return results containing all words with the specific prefix. For example tensor*
  • Fuzzy search: Use ~N (tilde followed by a number) at the end of any word for a fuzzy search. This will return results that are similar to the search word. N specifies the “edit distance” (fuzziness) of the match. For example Polibs~1
  • Words close to each other: ~N (tilde followed by a number) after a phrase (in quotes) returns results where the words are close to each other. N is the maximum number of positions allowed between matching words. For example "ipu version"~2
  • Logical operators. You can use the following logical operators in a search:
    • + signifies AND operation
    • | signifies OR operation
    • - negates a single word or phrase (returns results without that word or phrase)
    • () controls operator precedence

13. gc-links

This tool displays the status and connectivity of each of the IPU-Links that connect the IPUs. It does this by “training” the links with some data, and then checking that the data can be retrieved across all the links.

To use it, run:

gc-links -j

Note: On IPU-POD systems, the IPU-Links are trained when the partition is created. When running gc-links, the output will look something like this:

{
  "iteration_0": {
      "warning": "Links already trained - no training performed",
      "overall_result": "passed",
      "discovery_method": "VIRM"
  },
  "requested_iterations": "1",
  "iterations_with_failures": "0",
  "overall_result": "passed"
}

On a system where link training is available, the output will look similar to the following example:

{
  "ipu to ipu": {
      "from pcie id": "5",
      "to pcie id": "7",
      "channel": {
          "from": "NLC_E_2A",
          "to": "NLC_E_3A",
          "status": "passed",
          "gen": "4",
          "lanes": "8"
      },
      "channel": {
          "from": "NLC_E_2B",
          "to": "NLC_E_3B",
          "status": "passed",
          "gen": "4",
          "lanes": "8"
      }
  },
  "num ipus": "16",
  "overall result": "passed",
  "training fails": "0"
}

The “status” field shows the training status for each link. The “lanes” field shows the number of lanes being trained, and the “gen” field shows what generation of link is tested.

When gc-links finds that a link fails to train, the output looks like this:

{
    "from_pcie_id": "7",
    "to_pcie_id": "6",
    "channels": [
        {
            "from": "NLC_W_0B",
            "to": "NLC_W_0B",
            "num_tries": "1",
            "status": "passed",
            "gen": "4",
            "lanes": "8",
            "lowest_fom": "154"
        },
        {
            "from": "NLC_W_0C",
            "to": "NLC_W_0C",
            "num_tries": "1",
            "status": "failed",
            "gen": "4",
            "lanes": "8",
            "lowest_fom": "140"
        }
    ]
}

This output shows that it failed to train the link from device 7 to device 6, using link NLC_W_1C.

You can run gc-inventory to show more information on devices 6 and 7:

Device
  "id": "6",
  "target": "PCIe",
  "average board temp": "26.6 C",
  "average die temp": "20.1 C",
  "board ipu index": "0",
  "board serial number": "0057.0063.822351",
  "board type": "C600",
  "board variant": "C600",
  "clock": "20MHz",
  "driver version": "1.3.0",
  "firmware major version": "2",
  "firmware minor version": "6",
  "firmware patch version": "7",
  "firmware version": "2.6.7",
  "hexoatt active size (bytes)": "0",
  "hexoatt total size (bytes)": "16911433728",
  "hexopt active size (bytes)": "0",
  "hexopt total size (bytes)": "268435456",
  "icu bootloader version": "2.6.0",
  "ipu architecture": "ipu21",
  "ipu error state": "no errors",
  "ipu power": "N\/A",
  "ipu utilisation": "0.00%",
  "ipu utilisation (session)": "0.00%",
  "link correctable error count": "0",
  "link correctable error count (session)": "0",
  "link speed": "8.0 GT\/s",
  "link width": "8",
  "numa node": "0",
  "parity error count threshold": "4",
  "parity error threshold interval": "7776000 seconds",
  "parity initialised": "1",
  "pci id": "0000:1b:00.0",
  "pcie physical slot": "PCIe Slot 12",
  "remote buffers supported": "1",
  "sysfs file id": "0",
  "total board power": "12.4 W"
Device
  "id": "7",
  "target": "PCIe",
  "average board temp": "26.6 C",
  "average die temp": "20.1 C",
  "board ipu index": "0",
  "board serial number": "0057.0063.822352",
  "board type": "C600",
  "board variant": "C600",
  "clock": "20MHz",
  "driver version": "1.3.0",
  "firmware major version": "2",
  "firmware minor version": "6",
  "firmware patch version": "7",
  "firmware version": "2.6.7",
  "hexoatt active size (bytes)": "0",
  "hexoatt total size (bytes)": "16911433728",
  "hexopt active size (bytes)": "0",
  "hexopt total size (bytes)": "268435456",
  "icu bootloader version": "2.6.0",
  "ipu architecture": "ipu21",
  "ipu error state": "no errors",
  "ipu power": "N\/A",
  "ipu utilisation": "0.00%",
  "ipu utilisation (session)": "0.00%",
  "link correctable error count": "0",
  "link correctable error count (session)": "0",
  "link speed": "8.0 GT\/s",
  "link width": "8",
  "numa node": "0",
  "parity error count threshold": "4",
  "parity error threshold interval": "7776000 seconds",
  "parity initialised": "1",
  "pci id": "0000:1b:00.0",
  "pcie physical slot": "PCIe Slot 13",
  "remote buffers supported": "1",
  "sysfs file id": "0",
  "total board power": "12.4 W"

You can see, in this example, that there’s an issue with the link between the card in slot 12 and 13.

When training a chassis full of IPUs, the tool outputs additional information on IPU-Link failures, for example:

{
  "failures": [
    {
        "cable_id": "IPUL-00"
    },
    {
        "cable_id": "IPUL-24"
    },
    {
        "cable_id": "IPUL-25"
    }
  ]
}

The physical location of these failing cables is shown in ipu_link_channel_mapping.

13.1. Usage

13.1.1. Allowed options

-j, --json-output

Emit JSON output

-n {arg}, --num-retries {arg}

Number of link training retries (default: 3)

-d {id}, --device-id {id}

Device id (default is largest group)

-i {arg}, --num-iterations {arg}

Number of times to train each link (default: 1)

-v, --verbose

Verbose output

-p, --phy-summary

Print PHY summary after all training runs

-h, --help

Produce help message

--until-trained

Run the training until it succeeds

--train-to-gen {arg}

Train links to specified generation, default is gen 4

--single-ipu-loopback

Train links using single IPU loopback config

--version

Version number

Previous Next

Revision 9d3bf6cd.