Added support for float8 inference (host-based conversion, variables, constants, view changers, cast, matmul and conv).
Added support for replicated tensor sharding (RTS) and replica-grouped initialisation with multiple instances for remote variables.
Added the following ops:
diagmethod to the
Disabled storing variable data with executable caching.
subsampleop to handle slicing with step > 1.
Add abs, cos and sin operations providing additional operation coverage.
Add ability to compile without acquiring IPUs providing improved IPU resource utilisation.
Add support for
ifoperation allowing for more expressive control flow.
Add ability to broadcast binary operations.
Enable support for all stride configurations collectives and replicated variables.
The ability to stride replicas in collectives is useful for advanced use cases like Tensor Model Parallel (TMP).
replicated_all_gatherop so that the output tensor has shape