5.1.15. Driver and utilities changelog

Kernel module

1.1.2

  • T54187: Support identification of the Bow platform.

  • T55585: Constants generator for IPU kernel driver.

  • T55971: Fix potential race condition in device locking in the IPU PCIe driver.

  • T57475: Fix incorrect logging of parity errors within the IPU PCIe driver.

  • T57508: Trim link speed string in gc-monitor.

  • T61384: Reduce the amount of clock change log entries generated on Bow when

    power saving.

  • T62155: Improve message clarity in kernel log for tile parity error detection.

  • T62647: Ensure all driver internal ICU transactions handle communication

    errors.

  • T62648: Sort tile parity error sysfs log so that tiles with the highest errors

    are seen first.

  • T63738: Log error code when failing to open device node in attach.

  • T63944: Fix issue on PCIe systems where IPUs can be incorrectly marked as in

    use by “unknown” process.

1.1.1 (SDK 2.5.1)

  • T50828: Remove deprecated sync utilisation code.

  • T52353: Removed the need for GCDA_MONITOR to get power/temperature values.

  • T52721: Preserve mark counts in non POSTED modes on exit.

  • T52775: Implemented detection of multiple tile parity errors.

  • T52776: If an IPU memory failure occurs, record unrecoverable error and mark device as unusable.

  • T55372: Added new correctable error counters that clear on IPU reset.

  • T55565: Improved power and temperature reporting.

  • T56430: Detect when processes are attached to a device from another namespace.

Low-level libraries and tools

2.6.0+5997

  • T51433: Fix potential hang in device detach.

  • T54187: Support identification of Bow platform.

  • T55309: ICU interfaces updated to support FW 2.5.0.

  • T55372: Added new correctable error counters that clear on IPU reset.

  • T55426: Fixed PVTI PopRun exception.

  • T56479: Remove the need to set GCDA_MONITOR when logging PVTI teleletry.

  • T57081: For gc-docker unless --no-default-env is given, any VIPU

    environment variables will automatically be passed down. The new --pass-env option allows passing down of additional host environment variables.

  • T58063: Improve logging in client when a read_config_register call happens

    after detach.

  • T58090: Improve clarity in exception message from the IPU bootloader for sync

    misconfiguration.

  • T58258: Fix for a segmentation fault in gc-iputraffictest single-ipu mode.

  • T58599: Improved IPUoF client exception message when IPU is already in use.

  • T59010: Fix gc-monitor title column width with native PCIe devices.

  • T59236: Avoid timeout on read_config_space request done shortly after a

    partition reset.

  • T59459: Enriched the gc-info error message for not ACTIVE partitions.

  • T59533: gc-hosttraffictest uses the largest multi IPU device or the first

    single IPU device when device ID option is not defined.

  • T59544: Allowed using gc-hosttraffictest against 64 IPUs partitions.

  • T59578: Fixed possible IPUoF segmentation fault during process fork.

  • T59847: Allow more control over IPU-Link training parameters.

  • T60286: In gc-lldb fix the core dump when writing an invalid register.

  • T60396: A fix for gc-monitor csv mode when a device attribute is missing.

  • T60926: Fix fabric board type discovery for old IPUoF servers.

  • T60999: Include partition status into “Fabric error” message.

  • T61387: Add ipu_arch_info support for instructions with no operands.

  • T61484: Don’t call PCIe_reset IOCTL from user space library if PCIe driver

    version doesn’t implement it.

  • T61740: Host sync state dump includes IPUoF HSP mark count details.

  • T61754: Eliminate risk of unnecessary mark update upon detach (still necessary

    for POSTED sync notification mode).

  • T62884: The IPUOF_VIPU_API_HOST environment variable is now optional,

    default value is “localhost”.

  • T63342: Avoid possible race when attempting to connect to a connected server.

  • T63422: Exception rather than hanging upon an IPU parity reset failure.

  • T65044: Increase server startup grace period from 120s to 180s.

2.5.1

  • T38729: Added gc-hostlatencytest.

  • T39431: Added GCDA API to allow querying the available PL-DDR on an IPU-M.

  • T40698: Updated gc-hosttraffictest to provide performance statistics for host memory transfers.

  • T41646: Added IPU-M version info to gc-monitor.

  • T44446: Force gRPC to not use a proxy server.

  • T48984: Refactor conversion of fabric exceptions to graphcore_target_access exceptions to improve maintainability.

  • T49018: Extended the PVTI API to allow setting of user thread names.

  • T49170: Add IPU power profile query option to gc-info.

  • T49902: Removed PCIe ID field from gc-monitor for Fabric devices.

  • T49958: Updated gc-info -l to return an error code if no devices are found.

  • T50043: gcipuinfo: add path parameter to application event record retrieval API.

  • T50828: Remove deprecated sync utilisation code.

  • T51093: gcipuinfo: add attributes to application event record listing IPUoF hosts.

  • T51249: GC tools report device discovery errors when no IPUs found.

  • T51264: Fix issues when attach is aborted at an early stage.

  • T51460: Add support for static partitions with varying sync types for the hardware testing command line tools.

  • T51503: Avoid a segmentation fault when using legacy environment variable IPUOF_CONFIG_PATH with an empty value.

  • T51526: gc-monitor: track IPUs that are in use by other headnodes.

  • T51527: gc-monitor: when IPUs are in use by other headnodes, display hostname.

  • T51694: Add error checking in GCDA when requesting invalid buffers.

  • T51744: Add option to set the duration for --host tests in`gc- hosttrafficttest`.

  • T51774: Extend internal interface used by V-IPU to support the enabling and disabling of NLC links.

  • T51832: Fix a rare issue in hgwio_server that can temporarily cause failure to attach.

  • T51974: IPUoF client calls ibv_fork_init() during RDMA client initialisation.

  • T52102: Improve error handling on attach in IPUoF.

  • T52132: Ensure all buffers are detached during IPUoF device detach.

  • T52248: Add sync group configuration debug information to host sync timeout exceptions.

  • T52249: The bootloader now throws GraphcoreDeviceAccessExceptions::ipu_bootloader_missing_sync for any bootloader sync errors so that they can be caught for sync debug reporting.

  • T52458: Added option to gc-monitor and gc-info to view IPUs in other partitions.

  • T52459: gcipuinfo can return device attributes and run health checks on devices in other partitions.

  • T52606: Improve IPUoF client HSP debug logging messages.

  • T52609: Fix fallback strategies used in IPUoF client HSP polling.

  • T52721: Preserve mark counts in non POSTED modes on exit.

  • T52775: Implemented detection of multiple tile parity errors.

  • T52776: If an IPU memory failure occurs, record unrecoverable error and mark device as unusable.

  • T53084: Updated IPUoF to allow tools to see IPU devices outside of the current partition.

  • T53170: Prevent the increment of marks on devices that have a GSP pin configuration that does not support HSP. This improves IPUoF performance for the bootloader and avoids confusing debug messages.

  • T53188: Fix docker images not working with Broadcom RNIC.

  • T53326: Make gcipuinfo report no IPU devices found as an error.

  • T53422: Fixed HSP update race between IPUoF client and server.

  • T53451: Make several attempts at checking if PL DDR is cleared at startup.

  • T53537: Order IPU-M devices numerically and by IPU Id in PCIe in gc-monitor display.

  • T53741: Fixed popc --version deadlock when PVTI is enabled.

  • T53755: Avoid RPC timeouts after first attach.

  • T53822: Added support for detection and handling of multiple parity errors.

  • T53884: Updated IPUoF RDMA QP retry count to improve link reliability.

  • T53895: Optimise IPUoF behavior on first attach.

  • T53977: Fix IPUoF race condition when receiving an attach request during detach.

  • T54030: Remove connection disconnect when get_device_info call fails.

  • T54110: Improve IPUoF mirror fence logging.

  • T54119: IPUoF server enables memmory error checking.

  • T54468: Improved the IPUoF error message when there’s been an issue creating the connection.

  • T54615: Improve recovery and debug in gc-hosttraffictest when a test times out.

  • T54685: gc-monitor --all-partitions now ignores partitions in an error state rather than terminating with an error.

  • T55364: Improve availability on IPUoF server start.

  • T55389: Added link to tutorials in the PVTI user guide.

  • T55407: Reset IPU upon any gc-hosttraffictest failure to recover host interface.

  • T55411: Reduce excessive output for gc-memorytest in verbose mode.

  • T55426: Fixed PVTI PopRun exception when the trace file cannot be created or if the tables already exist in the database.

  • T55565: Improved power and temperature reporting.

  • T55629: Extended gc-binary API to support the creation of tile IPU archives in incremental steps rather than at once.

  • T55942: Fixed a rare double free of allocated memory in the IPUoF server when the IPUoF connection fails.

  • T56150: Remove unnecessary files from the release packages.