5.1.15. Driver and utilities changelog
Kernel module
1.1.2
T54187: Support identification of the Bow platform.
T55585: Constants generator for IPU kernel driver.
T55971: Fix potential race condition in device locking in the IPU PCIe driver.
T57475: Fix incorrect logging of parity errors within the IPU PCIe driver.
T57508: Trim link speed string in
gc-monitor
.- T61384: Reduce the amount of clock change log entries generated on Bow when
power saving.
T62155: Improve message clarity in kernel log for tile parity error detection.
- T62647: Ensure all driver internal ICU transactions handle communication
errors.
- T62648: Sort tile parity error sysfs log so that tiles with the highest errors
are seen first.
T63738: Log error code when failing to open device node in attach.
- T63944: Fix issue on PCIe systems where IPUs can be incorrectly marked as in
use by “unknown” process.
1.1.1 (SDK 2.5.1)
T50828: Remove deprecated sync utilisation code.
T52353: Removed the need for
GCDA_MONITOR
to get power/temperature values.T52721: Preserve mark counts in non POSTED modes on exit.
T52775: Implemented detection of multiple tile parity errors.
T52776: If an IPU memory failure occurs, record unrecoverable error and mark device as unusable.
T55372: Added new correctable error counters that clear on IPU reset.
T55565: Improved power and temperature reporting.
T56430: Detect when processes are attached to a device from another namespace.
Low-level libraries and tools
2.6.0+5997
T51433: Fix potential hang in device detach.
T54187: Support identification of Bow platform.
T55309: ICU interfaces updated to support FW 2.5.0.
T55372: Added new correctable error counters that clear on IPU reset.
T55426: Fixed PVTI PopRun exception.
T56479: Remove the need to set
GCDA_MONITOR
when logging PVTI teleletry.- T57081: For
gc-docker
unless--no-default-env
is given, any VIPU environment variables will automatically be passed down. The new
--pass-env
option allows passing down of additional host environment variables.
- T57081: For
- T58063: Improve logging in client when a
read_config_register
call happens after detach.
- T58063: Improve logging in client when a
- T58090: Improve clarity in exception message from the IPU bootloader for sync
misconfiguration.
T58258: Fix for a segmentation fault in
gc-iputraffictest
single-ipu mode.T58599: Improved IPUoF client exception message when IPU is already in use.
T59010: Fix
gc-monitor
title column width with native PCIe devices.- T59236: Avoid timeout on read_config_space request done shortly after a
partition reset.
T59459: Enriched the
gc-info
error message for not ACTIVE partitions.- T59533:
gc-hosttraffictest
uses the largest multi IPU device or the first single IPU device when device ID option is not defined.
- T59533:
T59544: Allowed using
gc-hosttraffictest
against 64 IPUs partitions.T59578: Fixed possible IPUoF segmentation fault during process fork.
T59847: Allow more control over IPU-Link training parameters.
T60286: In
gc-lldb
fix the core dump when writing an invalid register.T60396: A fix for
gc-monitor
csv mode when a device attribute is missing.T60926: Fix fabric board type discovery for old IPUoF servers.
T60999: Include partition status into “Fabric error” message.
T61387: Add
ipu_arch_info
support for instructions with no operands.- T61484: Don’t call
PCIe_reset
IOCTL from user space library if PCIe driver version doesn’t implement it.
- T61484: Don’t call
T61740: Host sync state dump includes IPUoF HSP mark count details.
- T61754: Eliminate risk of unnecessary mark update upon detach (still necessary
for POSTED sync notification mode).
- T62884: The
IPUOF_VIPU_API_HOST
environment variable is now optional, default value is “localhost”.
- T62884: The
T63342: Avoid possible race when attempting to connect to a connected server.
T63422: Exception rather than hanging upon an IPU parity reset failure.
T65044: Increase server startup grace period from 120s to 180s.
2.5.1
T38729: Added
gc-hostlatencytest
.T39431: Added GCDA API to allow querying the available PL-DDR on an IPU-M.
T40698: Updated
gc-hosttraffictest
to provide performance statistics for host memory transfers.T41646: Added IPU-M version info to
gc-monitor
.T44446: Force gRPC to not use a proxy server.
T48984: Refactor conversion of fabric exceptions to
graphcore_target_access
exceptions to improve maintainability.T49018: Extended the PVTI API to allow setting of user thread names.
T49170: Add IPU power profile query option to
gc-info
.T49902: Removed
PCIe ID
field fromgc-monitor
for Fabric devices.T49958: Updated
gc-info -l
to return an error code if no devices are found.T50043:
gcipuinfo
: add path parameter to application event record retrieval API.T50828: Remove deprecated sync utilisation code.
T51093:
gcipuinfo
: add attributes to application event record listing IPUoF hosts.T51249: GC tools report device discovery errors when no IPUs found.
T51264: Fix issues when attach is aborted at an early stage.
T51460: Add support for static partitions with varying sync types for the hardware testing command line tools.
T51503: Avoid a segmentation fault when using legacy environment variable
IPUOF_CONFIG_PATH
with an empty value.T51526:
gc-monitor
: track IPUs that are in use by other headnodes.T51527:
gc-monitor
: when IPUs are in use by other headnodes, display hostname.T51694: Add error checking in GCDA when requesting invalid buffers.
T51744: Add option to set the duration for
--host
tests in`gc- hosttrafficttest`.T51774: Extend internal interface used by V-IPU to support the enabling and disabling of NLC links.
T51832: Fix a rare issue in
hgwio_server
that can temporarily cause failure to attach.T51974: IPUoF client calls
ibv_fork_init()
during RDMA client initialisation.T52102: Improve error handling on attach in IPUoF.
T52132: Ensure all buffers are detached during IPUoF device detach.
T52248: Add sync group configuration debug information to host sync timeout exceptions.
T52249: The bootloader now throws
GraphcoreDeviceAccessExceptions::ipu_bootloader_missing_sync
for any bootloader sync errors so that they can be caught for sync debug reporting.T52458: Added option to
gc-monitor
andgc-info
to view IPUs in other partitions.T52459:
gcipuinfo
can return device attributes and run health checks on devices in other partitions.T52606: Improve IPUoF client HSP debug logging messages.
T52609: Fix fallback strategies used in IPUoF client HSP polling.
T52721: Preserve mark counts in non POSTED modes on exit.
T52775: Implemented detection of multiple tile parity errors.
T52776: If an IPU memory failure occurs, record unrecoverable error and mark device as unusable.
T53084: Updated IPUoF to allow tools to see IPU devices outside of the current partition.
T53170: Prevent the increment of marks on devices that have a GSP pin configuration that does not support HSP. This improves IPUoF performance for the bootloader and avoids confusing debug messages.
T53188: Fix docker images not working with Broadcom RNIC.
T53326: Make
gcipuinfo
report no IPU devices found as an error.T53422: Fixed HSP update race between IPUoF client and server.
T53451: Make several attempts at checking if PL DDR is cleared at startup.
T53537: Order IPU-M devices numerically and by IPU Id in PCIe in
gc-monitor
display.T53741: Fixed
popc --version
deadlock when PVTI is enabled.T53755: Avoid RPC timeouts after first attach.
T53822: Added support for detection and handling of multiple parity errors.
T53884: Updated IPUoF RDMA QP retry count to improve link reliability.
T53895: Optimise IPUoF behavior on first attach.
T53977: Fix IPUoF race condition when receiving an attach request during detach.
T54030: Remove connection disconnect when
get_device_info
call fails.T54110: Improve IPUoF mirror fence logging.
T54119: IPUoF server enables memmory error checking.
T54468: Improved the IPUoF error message when there’s been an issue creating the connection.
T54615: Improve recovery and debug in
gc-hosttraffictest
when a test times out.T54685:
gc-monitor --all-partitions
now ignores partitions in an error state rather than terminating with an error.T55364: Improve availability on IPUoF server start.
T55389: Added link to tutorials in the PVTI user guide.
T55407: Reset IPU upon any
gc-hosttraffictest
failure to recover host interface.T55411: Reduce excessive output for
gc-memorytest
in verbose mode.T55426: Fixed PVTI PopRun exception when the trace file cannot be created or if the tables already exist in the database.
T55565: Improved power and temperature reporting.
T55629: Extended
gc-binary
API to support the creation of tile IPU archives in incremental steps rather than at once.T55942: Fixed a rare double free of allocated memory in the IPUoF server when the IPUoF connection fails.
T56150: Remove unnecessary files from the release packages.