14. Python API
Remember to import the IPU API using:
from tensorflow.python import ipu
You cannot access the IPU API via the top-level tensorflow
namespace.
For example, this will not work:
import tensorflow as tf
cfg = tf.python.ipu.create_ipu_config(...)
14.2. Compiler interface
- tensorflow.python.ipu.ipu_compiler.compile(computation, inputs=None)
Builds an operator that compiles and runs
computation
with the Graphcore IPU XLA backend.- Parameters
computation –
A Python function that builds a computation to apply to the input. If the function takes n inputs,
inputs
should be a list of n tensors.computation
may return a list of operations and tensors. Tensors must come before operations in the returned list. The return value ofcompile
is a list of tensors corresponding to the tensors from the output ofcomputation
.All operations returned from
computation
will be executed when evaluating any of the returned output tensors.inputs – A list of inputs or
None
(equivalent to an empty list). Each input can be a nested structure containing values that are convertible to tensors. Note that passing an N-dimension list of compatible values will result in a N-dimension list of scalar tensors rather than a single Rank-N tensors. If you need different behaviour, convert part of inputs to tensors withtf.convert_to_tensor
.
- Returns
Same data structure as if
computation(inputs)
is called directly with some exceptions for correctness.None output. a NoOp would be returned which control-depends on computation.
Single value output. A tuple containing the value would be returned.
Operation-only outputs. a NoOp would be returned which control-depends on computation.
- Raises
Exception – If the computation was not compiled for an IPU device.
14.3. Scoping contexts
- tensorflow.python.ipu.scopes.frontend_attribute(attribute_name, attribute_value, restore_to=None)
Sets the specified scope attribute to the specified value in the graph.
- Parameters
attribute_name – Name of the attribute.
attribute_value – Attribute’s value as a string.
restore_to – If at the end of the scope the attribute was to be undefined sets it to this value instead.
- Returns
A context
- tensorflow.python.ipu.scopes.ipu_jit_scope(ipu_scope)
Provides a scope for compilation of operations.
If you would like to compile several sets of operations together, then this can provide that mechanism.
- Parameters
ipu_scope – A name to differentiate between different JIT scopes
- Returns
A context
- tensorflow.python.ipu.scopes.ipu_scope(device)
Provides a scope for placing operations onto a particular IPU/IPU cluster.
- Parameters
device – The name of the Tensorflow device, eg ‘/device:IPU:0’
- Returns
A context
- tensorflow.python.ipu.scopes.ipu_shard(index)
Control sharding for a set of operations.
Provides a scope which targets operations onto a particular shard (IPU) of a multi-IPU sharded device.
- Parameters
index – The index of the IPU on which to place the enclosed operations.
- Returns
A context
- tensorflow.python.ipu.scopes.outside_compilation_scope(name='outside')
Provides a scope for placing operations on the host, outside the current compilation scope. The operations will be placed on the default host device. This allows for offloading computations from the IPU to the host, which can be useful for operations that are not supported or suitable for execution on the IPU.
Example:
def my_net(a): with ipu_scope("/device:IPU:0"): b = a * a with outside_compilation_scope(): c = b + 2 # Placed on the host. d = b + c return d
- Parameters
name – A name for the outside compilation scope.
- Returns
A context
- tensorflow.python.ipu.scopes.partials_type(override_type)
Override the default type used to store intermediate results by some operations.
- Parameters
override_type – Numpy type of the partials (float16 or float32)
- Returns
A context
- tensorflow.python.ipu.scopes.stochastic_rounding(override)
Control stochastic rounding for a set of operations.
- Parameters
override – if True then stochastic rounding will be used, otherwise it will be disabled for this set of operations.
- Returns
A context
14.4. Infeed queue
- class tensorflow.python.ipu.ipu_infeed_queue.IPUInfeedQueue(dataset, feed_name, device_ordinal=0, replication_factor=1, data_to_prefetch=1, prefetch_depth=None)
Wraps a tf.Dataset object with infeed operations specific to the IPU.
This class, along with
tensorflow.python.ipu.loops
is used to create a data pipeline from adataset
into a training/inference loop on the IPU inside a singlesession.run
which reduces the overheads of callingsession.run
for each iteration of the loop.You should pass the infeed queue as an argument to a loop from
tensorflow.python.ipu.loops
. These loops will then handle the dequeuing of the data to the device automatically.The feed_name allows individual feeds to be named. When including more than one feed in the same graph, each should be independently named.
The following skeleton shows how to use this method when building a training loop. Note how the body signature contains variables which correspond to the nested structure of
tf.Tensor
objects representing the next element in the infeed queue:# Create an example dataset. dataset = ... # A `tf.data.Dataset` object. def dataset_parser(value): features, labels = parse_record(value) return {"features": features, "labels": labels} # The resulting dataset has a nested structure of: {features, labels}. dataset = dataset.map(dataset_parser) infeed_queue = ipu.ipu_infeed_queue.IPUInfeedQueue(dataset, feed_name="training_infeed") # dataset can no longer be used beyond this point. def my_net(): # Note how the nested structure forms part of the loop body signature. def body(loss, features, labels): with variable_scope.variable_scope("vs", use_resource=True): y = tf.conv2d(features, .....) ... ... logits = tf.nn.xw_plus_b(....) loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels)) optimizer = gradient_descent.GradientDescentOptimizer(0.000001) train = optimizer.minimize(loss) with ops.control_dependencies([train]): return array_ops.identity(loss) loss = 0.0 return = tf.python.ipu.loops.repeat(10000, body, [loss], infeed_queue) with ipu.scopes.ipu_scope("/device:IPU:0"): res = ipu_compiler.compile(my_net, inputs=[]) with tf.Session() as sess: sess.run(infeed_queue.initializer) sess.run(variables.global_variables_initializer()) result = sess.run(res)
- __init__(dataset, feed_name, device_ordinal=0, replication_factor=1, data_to_prefetch=1, prefetch_depth=None)
Creates an IPUInfeedQueue object.
- Parameters
dataset – a
tf.data.Dataset
object, all transformations e.g.shuffle
,repeat
,batch
must be applied prior to passing in to this function. This dataset can no longer be used after creating this queue.feed_name – the name of the infeed queue. This must be unique between all IPUInfeedQueues and IPUOutfeedQueues.
device_ordinal – ordinal of the IPU device on which this queue will be used. By default the queue will be used on “/device/IPU:0”.
replication_factor – the number of replicated graphs this infeed will be used in.
data_to_prefetch –
- the amount of data to prefetch.
Defaults to 1, no prefetch. If set to non-1 (and non-0) each time we sync with the CPU we will return this number of dataset values rather than 1. This must not go over the size of the dataset if it is not repeating, and will increment the infeed by this amount each time so using the infeed in multiple programs or loops should take into account that if
data_to_prefetch
is not a factor of the previous iterations then the next loop/program will not be starting at the iteration it otherwise would be. This will obviously increase the memory usage from having more batches live at a given point but should give a speed up by having to make fewer round trips to host memory. It may be that larger number of batches should be prefetched at once in order to see any benefit as the lookup itself has some overhead from internal copies.- prefetch_depth: the number of elements poplar will prefetch.
The depth of the poplar datastream buffer size which may be prefetched before being read by the device. By default the prefetch_depth size is automatically determined. Increasing the size of the prefetch_depth allows for prefetching of multiple entries, increasing the probability there will be a valid entry in the buffer for the device to read before falling back to synchronously fetching the next entry.
- Raises
ValueError – if all dimensions of shapes of dataset.output_shapes are not fully defined. tf.data.batch function must be called with
drop_remainder=True
to ensure that batch size is constant.
- property deleter
A
tf.Operation
that can be run to delete the resources owned by this IPUInfeedQueue. This allows creating a new IPUInfeedQueue with the same name afterwards.- Returns
A
tf.Operation
that can be run to delete this IPUInfeedQueue
- property dequeued
Returns whether this queue has been dequeued.
- Returns
A nested structure of
tf.Tensor
objects.
- get_next()
Obsolete function.
- property initializer
A
tf.Operation
that should be run to initialize this IPUInfeedQueue.- Returns
A
tf.Operation
that should be run to initialize this IPUInfeedQueue- Raises
ValueError – if the function
initializer
has already been called.
- property number_of_tuple_elements
Returns the number of arguments supplied by this IPUInfeedQueue.
14.5. Outfeed queue
- class tensorflow.python.ipu.ipu_outfeed_queue.IPUOutfeedMode(value)
Types used to control the IPUOutfeedQueue modes.
Contains the following values:
ALL
- When used with an IPUOutfeedQueue, all the elements which were enqueued to the queue will be returned by the outfeed.LAST
- When used with an IPUOutfeedQueue, only the last element which was enqueued to the queue will be returned by the outfeed.
- class tensorflow.python.ipu.ipu_outfeed_queue.IPUOutfeedQueue(feed_name, outfeed_mode=None, outfeed_all=None, device_ordinal=0, replication_factor=1, io_batch_size=1)
Generates and adds outfeed enqueue/dequeue operations to the graph.
An outfeed is the counterpart to an infeed and manages the transfer of data (like tensors, tuples or dictionaries of tensors) from the IPU graph to the host.
The queue has two modes of operation - outfeed all or outfeed last. In outfeed all mode every element that is enqueued will be stored for a subsequent dequeue. All of the enqueued elements will be returned when the dequeue operation is run. This is the default behaviour.
In outfeed last mode only the last enqueued element is stored. The dequeue operation will in this case return a single element.
- __init__(feed_name, outfeed_mode=None, outfeed_all=None, device_ordinal=0, replication_factor=1, io_batch_size=1)
Creates an IPUOutfeedQueue object. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(outfeed_all)
. They will be removed in a future version. Instructions for updating: Use outfeed_mode instead.- Parameters
feed_name – a user provided name for the outfeed operation. Must be unique within all IPUOutfeedQueue and IPUInfeedQueue operations.
outfeed_mode –
ipu_outfeed_queue.IPUOutfeedMode
type used to control the outfeed behaviour. If not specified then all elements will be returned by the outfeed when the dequeue operation is run.outfeed_all – deprecated.
device_ordinal – ordinal of the IPU device on which this queue will be used. By default the queue will be used on “/device/IPU:0”.
replication_factor – the number of replicated graphs this Outfeed will be used in.
io_batch_size – Output tensors will be batched into this number of samples before being sent to the host. This reduces the amount of device->host communication at the expense of needing to store the tensors on the device, and the extra computation required to operate the batching.
- Raises
ValueError – if the types or values are incorrect
- property deleter
A
tf.Operation
that can be run to delete the resources owned by this IPUOutfeedQueue. This allows creating a new IPUOutfeedQueue with the same name afterwards. The behaviour is undefined if this op is executed concurrently with the dequeue op.- Returns
A
tf.Operation
that can be run to delete this IPUOutfeedQueue
- dequeue()
Generate host side operation to dequeue the outfeed values. The operation generated by this function will block if called prior to any enqueues.
The return value of this operation depends on the enqueued tensors, replication factor and the execution mode.
Examples:
Outfeed returning a single tensor:
outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue(feed_name="outfeed", replication_factor=2) def body(input): output = input + 1 outfeed = outfeed_queue.enqueue(output) return (output, outfeed) def my_net(input): r = loops.repeat(20, body, (input)) return r with ipu.scopes.ipu_scope("/device:IPU:0"): res = ipu_compiler.compile(my_net, inputs=[v]) with ops.device('cpu'): v = tf.placeholder(np.float32, [4, 4]) outfeed = outfeed_queue.dequeue() with tf.Session() as sess: result = sess.run(res, {v:np.ones([4, 4], np.float32)}) outfed = sess.run(outfeed)
In this example the tensor
output
is of shape [4, 4] and it’s enqueued into the outfeed with replication_factor = 2. If theoutfeed_mode
isoutfeed_mode == IPUOutfeedMode.ALL
, then the shape of the resultingoutfed
tensor will be [20, 2, 4, 4], where the first dimension represents the number of times we have enqueued a tensor to the outfeed - in this example the loop is repeated 20 times, and therefore we get 20 values back from the outfeed. The second dimension is the replication_factor, which allows us to see the individual values from each replicated graph. If theoutfeed_mode
isoutfeed_mode == IPUOutfeedMode.LAST
, then the shape of the resultingoutfed
tensor will be [2, 4, 4], which represents the value of the output tensor the last time it was enqueued during execution for each of the replicated graphs.Outfeed returning a tuple of tensors:
outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue(feed_name="outfeed") def body(input): output = input + 1 sum = tf.reduce_sum(output) outfeed = outfeed_queue.enqueue((output, sum)) return (output, outfeed) def my_net(input): r = loops.repeat(20, body, (input)) return r with ipu.scopes.ipu_scope("/device:IPU:0"): res = ipu_compiler.compile(my_net, inputs=[v]) with ops.device('cpu'): v = tf.placeholder(np.float32, [4, 4]) outfeed = outfeed_queue.dequeue() with tf.Session() as sess: result = sess.run(res, {v:np.ones([4, 4], np.float32)}) outfed = sess.run(outfeed)
In this example we outfeed a tuple of tensors,
output
andsum
, where the former is of shape [4, 4] and latter [1]. If theoutfeed_mode
isoutfeed_mode == IPUOutfeedMode.ALL
, then the resulting outfed is a two-tuple of tensors with shapes ([20, 4, 4], [20, 1]), where the first dimension in each of the tensors represents the number of times we have enqueued these tensors to the outfeed - in this example the loop is repeated 20 times, and therefore we get 20 values back from the outfeed for each of the tensors in the tuple. If theoutfeed_mode
isoutfeed_mode == IPUOutfeedMode.LAST
, then theoutfed
is a two tuple of tensors with shapes ([4, 4], [1]), which represents the values of theoutput
andsum
tensors the last time they were enqueued during execution.Note that
replication_factor
here is the default (=1), which means that the extra replication dimension is not added.Outfeed returning a dictionary of tensors:
outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue(feed_name="outfeed", replication_factor=8) def body(input): output = input + 1 sum = tf.reduce_sum(output) outfeed = outfeed_queue.enqueue({"x": output, "y": sum}) return (output, outfeed) def my_net(input): r = loops.repeat(40, body, (input)) return r with ipu.scopes.ipu_scope("/device:IPU:0"): res = ipu_compiler.compile(my_net, inputs=[v]) with ops.device('cpu'): v = tf.placeholder(np.float32, [4, 4]) outfeed = outfeed_queue.dequeue() with tf.Session() as sess: result = sess.run(res, {v:np.ones([4, 4], np.float32)}) outfed = sess.run(outfeed)
In this example we outfeed a dictionary of tensors,
output
andsum
, where the former is of shape [4, 4] and latter [1]. If theoutfeed_mode
isoutfeed_mode == IPUOutfeedMode.ALL
, then the resulting outfed is a dictionary of tensors with shapes: {“x”: [40, 8, 4, 4], “y”: [40, 8, 1]}, where the first dimension in each of the tensors represents the number of times we have enqueued these tensors to the outfeed - in this example the loop is repeated 40 times, and therefore we get 40 values back from the outfeed for each of the tensors in the tuple. The second dimension is the replication_factor, which allows us to see the individual values from each replicated graph. If theoutfeed_mode
isoutfeed_mode == IPUOutfeedMode.LAST
, then theoutfed
is a dictionary of tensors with shapes: {“x”: [8, 4, 4], “y”: [8, 1]}, which represents the values of theoutput
andsum
tensors the last time they were enqueued during execution for each of the replicated graphs.
- enqueue(tensors)
Enqueue a tensor, tuple or a dictionary of tensors for being outfed from the IPU graph. This operation is placed on the IPU device. This function returns an Operation which needs be executed (by either returning it or using tf.control_dependencies(…))
Examples:
Outfeed returning a single tensor:
outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue(feed_name="outfeed") def body(v): v = v + 1 outfeed = outfeed_queue.enqueue(v) return (v, outfeed) def my_net(v): r = loops.repeat(20, body, (v)) return r with ipu.scopes.ipu_scope("/device:IPU:0"): res = ipu_compiler.compile(my_net, inputs=[v]) ... ...
Outfeed returning a tuple of tensors:
outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue(feed_name="outfeed") def body(v): v = v + 1 x = v * 2 outfeed = outfeed_queue.enqueue((v, x)) return (v, outfeed) def my_net(v): r = loops.repeat(20, body, (v)) return r with ipu.scopes.ipu_scope("/device:IPU:0"): res = ipu_compiler.compile(my_net, inputs=[v]) ... ...
Outfeed returning a dictionary of tensors:
outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue(feed_name="outfeed") def body(v): v = v + 1 x = v * 2 outfeed = outfeed_queue.enqueue({"output_1": v, "output_2": x}) return (v, outfeed) def my_net(v): r = loops.repeat(20, body, (v)) return r with ipu.scopes.ipu_scope("/device:IPU:0"): res = ipu_compiler.compile(my_net, inputs=[v]) ... ...
14.6. General utilities
- class tensorflow.python.ipu.utils.DeviceConnectionType(value)
Enumeration to describe the mechanism used to attach to the Poplar device.
ALWAYS
indicates that the system will attach when configuring the device.ON_DEMAND
will defer connection to when the IPU is needed.NEVER
will never try to attach to a device. Used when compiling offline.
- class tensorflow.python.ipu.utils.ExecutionProfileType(value)
The execution profile type indicates the desired information in the execution profile.
NO_PROFILE
indicates that there should be no execution profiling.DEVICE_PROFILE
indicates that the execution profile should contain only device wide events.IPU_PROFILE
indicates that the profile should contain IPU level execution events.TILE_PROFILE
indicates that the profile should contain Tile level execution events.
- class tensorflow.python.ipu.utils.SelectionOrder(value)
Depending on the communication pattern of the model, the order in which the IPUs are selected and mapped to shards can impact the performance.
For example, given a model which executes on multiple IPUs:
def sharded_graph(pa, pb, pc, pd): with ipu.scopes.ipu_shard(0): o1 = pa + pb with ipu.scopes.ipu_shard(1): o2 = o1 + pc with ipu.scopes.ipu_shard(2): o3 = o2 + pd return o3
and a typical machine with 8 Graphcore C2 cards:
_______ _______ | | | | | 14 |=============| 15 | |_______| |_______| || || _______ _______ | | | | | 12 |=============| 13 | |_______| |_______| || || _______ _______ | | | | | 10 |=============| 11 | |_______| |_______| || || _______ _______ | | | | | 8 |=============| 9 | |_______| |_______| || || _______ _______ | | | | | 6 |=============| 7 | |_______| |_______| || || _______ _______ | | | | | 4 |=============| 5 | |_______| |_______| || || _______ _______ | | | | | 2 |=============| 3 | |_______| |_______| || || _______ _______ | | | | | 0 |=============| 1 | |_______| |_______|
(where each numbered square represents an IPU with the given device ID and the == and || connections represent IPUs being directly connected via IPU-Links)
we can see that the
ipu_shard(0)
directly communicates withipu_shard(1)
and thatipu_shard(1)
directly communicates withipu_shard(2)
. If the shards 0, 1, 2 were mapped to IPUs 0, 1, 2 in that order, then the communication between shards 1 and 2 would not have a direct connection via an IPU-Link and would have to perform a “hop” via an IPU. If the shards 0, 1, 2 were mapped to IPUs 0, 1, 3 in that order, then the communication between shards 1 and 2 would have a direct connection via an IPU-Link which will reduce the communication cost.This Enum class is used to control the order in which the IPUs are selected. Currently, the following IPU selection orderings are supported:
AUTO
: automatically try and select the best selection given the network.ZIGZAG
: follow the natural ordering of IPUs. In the above example, the IPUs would be selected in the following order:0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
.SNAKE
: select IPUs such that each consecutive shard is directly connected via IPU-Links to the shard before and after. In the above example, the IPUs would be selected in the following order:0, 1, 3, 2, 4, 5, 7, 6, 8, 9, 11, 10, 12, 13, 15, 14
.HOOF
: select IPUs such that each consecutive shard is directly connected via IPU-Links to the shard before and after and the last and first shard are on the same C2 cards. In the above example, the IPUs would be selected in the following order:0, 2, 4, 6, 8, 10, 12, 14, 15, 13, 11, 9, 7, 5, 3, 1
.
The
SNAKE
andHOOF
IPU selection orders are particularly beneficial for pipelined models.
- class tensorflow.python.ipu.utils.VerificationOptions
Store pairs of key / id to use for each type of data used in the graph. Does nothing unless verified transfers have been enabled by calling
set_transfer_options(opts, use_verified_transfers=True)
and an instance of this class has been set by callingset_verification_options
:o = VerificationOptions() o.inputs.key = 1 o.infeeds["infeed"].key = 3 set_verification_options(opts, o)
- tensorflow.python.ipu.utils.auto_select_ipus(opts, num_ipus)
Configure the IPUs to be used by the session.
The configuration describes a system consisting of multiple Tensorflow devices, each with control of one of more IPUs. The devices will be labeled
/device:IPU:0
,/device:IPU:1
and so on.Each device can control a specific number of IPUs, given by the
num_ipus
parameter. The system will automatically select IPU configurations from the available IPUs, where they match the desired number of IPUs.Examples:
# Create a single device, with one IPU opts = create_ipu_config() opts = auto_select_ipus(opts, num_ipus=1) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
# Create two devices, with 2 IPUs per device. opts = create_ipu_config() opts = auto_select_ipus(opts, num_ipus=[2,2]) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
# Create two devices, with 1 IPU in the first device and 2 IPUs # in the second device. opts = create_ipu_config() opts = auto_select_ipus(opts, num_ipus=[1,2]) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
num_ipus – List of IPUs per Tensorflow device
- Returns
The IpuOptions configuration protobuf, configured for auto-selecting a set of IPU devices.
- tensorflow.python.ipu.utils.configure_ipu_system(config, device='cpu')
Configure an IPU system. Passing an IpuOptions protobuf created by the
create_ipu_config
function.- Parameters
config – An IpuOptions configuration protobuf
device – The CPU device which is local to the IPU hardware
- Returns
None
- tensorflow.python.ipu.utils.create_ipu_config(profiling=False, enable_ipu_events=False, use_poplar_text_report=False, use_poplar_cbor_report=False, profile_execution=None, enable_poplar_serialized_graph=False, report_every_nth_execution=0, max_report_size=268435456, report_directory='', scheduler_selection='', always_rearrange_copies_on_the_host=False, merge_infeed_io_copies=False, disable_graph_convolution_caching=False, disable_graph_outlining=False, retain_control_dependencies=False, max_cross_replica_sum_buffer_size=0, max_inter_ipu_copies_buffer_size=0, max_scheduler_lookahead_depth=5, max_scheduler_search_space_size=64, prefetch_data_streams=True, selection_order=None, enable_experimental_remote_buffer_embedding=False)
Create an empty IPU session configuration structure. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(max_cross_replica_sum_buffer_size, max_inter_ipu_copies_buffer_size)
. They will be removed in a future version. Instructions for updating: Use set_optimization_options() instead.- Parameters
profiling – Enable compilation reports, and IPU trace events.
enable_ipu_events – Enable IPU trace events without poplar reports.
use_poplar_text_report – Enable the Poplar textual report summary.
use_poplar_cbor_report – Enable the Poplar CBOR reports.
profile_execution – Include Poplar execution profiles in the execution events. Can only be enabled if
profiling
is also enabled. If set, can beTrue
, ‘False`, or a member of theExecutionProfileType
enumeration. ATrue
value indicatesExecutionProfileType.DEVICE_PROFILE
.enable_poplar_serialized_graph – Create the Poplar serialized graph and include in the IPU compilation trace events.
report_every_nth_execution – Only produce an execution report on every Nth execution. 0 = One report only.
max_report_size – The maximum size of Poplar profiles to include in the profile events.
report_directory – When set, reports will be written to files in this directory, instead of being written into the events. The events will contain the full paths of the report files.
scheduler_selection – When set, this forces the compiler to use a specific scheduler when ordering the instructions. See the documentation for a list of valid schedulers.
always_rearrange_copies_on_the_host – * Experimental Flag * The data which is streamed to/from the device might be stored in different layouts on the device and on the host. If that is the case the rearrangment is performed on the device by default. By enabling this option the rearrangment will be performed on the host at the expense of latency.
merge_infeed_io_copies – When true, this flag will merge the streamed host->device input copies into one larger copy. This may reduce the time to copy data from the host, at the expense of increasing the live tensor memory on the device.
disable_graph_convolution_caching – By default, the convolution operation searches for an equivalent cached operation, and uses this instead of creating a new convolution. Setting this flag forces the creation of a new convolution. This can improve runtime at the expense of graph size.
disable_graph_outlining – By default, some operations, such as matrix multiplications, which occur in the graph multiple times but with different input tensors might be optimised to reduce the total code size of the graph at the expense of the execution time. Setting this flag will disable these optimisations. This option is not valid for the convolution operation (also see disable_graph_convolution_caching)
retain_control_dependencies – When set to true, control dependencies from the Tensorflow graph are passed through to the backend. This can result in a different memory size due to differing constraints on the operation scheduler.
max_cross_replica_sum_buffer_size – The maximum number of bytes that can be waiting before a cross replica sum op is scheduled.
max_inter_ipu_copies_buffer_size – The maximum number of bytes that can be waiting before a inter IPU copy between IPUs is scheduled.
max_scheduler_lookahead_depth – The maximum distance to look into the future when considering valid schedules.
max_scheduler_search_space_size – The maximum number of nodes to consider when building the tree of future schedules.
prefetch_data_streams – When set to true, the prefetching of data for data streams on the host will be overlapped with execution on the IPU.
selection_order – the order in which IPUs are selected and mapped to physical IPU devices when using a multi-IPU devices (see
SelectionOrder
). When not specified, then automatic selection order is used, otherwise an instance ofSelectionOrder
.enable_experimental_remote_buffer_embedding – When set to true,
HostEmbedding
will make use of poplar remote buffers.
- Returns
An IpuOptions configuration protobuf, suitable for passing to
configure_ipu_system
- tensorflow.python.ipu.utils.export_dataset_to_file(dataset_or_infeed, output_filename, num_elements, feed_name='', apply_options=True)
Export as binary
num_elements
from the giveninfeed
to the specifiedoutput_filename
.If the infeed elements are tuples then one file per tuple element will be created. For example, if
dataset
looks like[{ "a": A_0, "b": B_0}, { "a": A_1, "b": B_1}, ...]
then
export_dataset_to_file(dataset, "my_dataset.bin", 100)
will generate:my_dataset.0.bin # Contains tensors [ A_0, A_1, ..., A_99] my_dataset.1.bin # Contains tensors [ B_0, B_1, ..., B_99]
- Parameters
dataset_or_infeed – An unary dataset with the same input and output structure or an
IPUInfeedQueue
.output_filename – Where to export the tensors to.
num_elements – Number of elements to export from the dataset.
feed_name – Specify the feed name.
apply_options – Whether to apply optimization options which can improve the dataset performance.
- tensorflow.python.ipu.utils.export_inputs_to_file(inputs, output_filename, feed_dict)
Export as binary the list of
inputs
provided to the specifiedoutput_filename
.- Parameters
inputs – List of graph inputs to export.
output_filename – Where to export the tensors to.
feed_dict – Feed dictionary containing the inputs’ values.
- tensorflow.python.ipu.utils.extract_all_events(events)
Extract a list containing each event as an event object
- Parameters
events – A tensor containing a list of IPU events as protobuf strings
- Returns
A list containing IpuTraceEvent objects
- tensorflow.python.ipu.utils.extract_all_strings_from_event_trace(events)
Extract a concatenation of all data strings from an IPU event trace.
- Parameters
events – An array of IPU events as returned from the
ipu_compile_summary
operation.- Returns
A string containing the concatenation of all of the data fields of the events.
- tensorflow.python.ipu.utils.extract_all_types_from_event_trace(events)
Return a list of the types of each event in an event trace tensor
- Parameters
events – A tensor containing a list of IPU events as protobuf strings
- Returns
A list containing the type of each event
- tensorflow.python.ipu.utils.extract_compile_reports(events)
Get a list of all compiler reports in the event list.
- Parameters
events – A list of trace event serialized protobufs.
- Returns
A list of tuples containing the module name and report.
- tensorflow.python.ipu.utils.extract_execute_reports(events)
Get a list of all compiler reports in the event list.
- Parameters
events – A list of trace event serialized protobufs.
- Returns
A list of tuples containing the module name and report.
- tensorflow.python.ipu.utils.extract_poplar_serialized_graphs(events)
Get a list of all poplar serialized graphs in the event list.
- Parameters
events – A list of trace event serialized protobufs.
- Returns
A list of tuples containing the module name and report.
- tensorflow.python.ipu.utils.get_ipu_config(session=None)
Get the configuration of an IPU system.
- Parameters
session – An optional session on which to execute.
- Returns
A list of IpuOption instances, one for each PoplarExecutor.
- tensorflow.python.ipu.utils.get_num_of_ipus_in_device(ipu_device, device='cpu')
Get the number of physical IPUs
- Parameters
ipu_device – The IPU device for which to get the number of devices for.
device – The CPU device which is local to the IPU hardware.
- Returns
A number of physical IPUs configured for a particular TF device.
- tensorflow.python.ipu.utils.move_variable_initialization_to_cpu(graph=None)
For all variables in the VARIABLES collection, move any initialization ops onto the CPU.
- Parameters
graph – Operations are moved around on this graph. The default graph will be used if not specified.
- Returns
None
- tensorflow.python.ipu.utils.reset_ipu_seed(seed, device='/device:IPU:0', cpu_device='cpu')
Reset the seed used to generate stateful random numbers and perform stochastic rounding.
- Parameters
seed – The new random number generator seed.
device – The device to which the seed will be applied.
cpu_device – The CPU device which is on the same hardware to the IPU device.
- Returns
None
- tensorflow.python.ipu.utils.running_on_ipu_model()
Check if XLA is configured to run on the ipu model.
- Returns
True if XLA is configured to run on the ipu model. False if XLA is configured to run on real hardware.
- tensorflow.python.ipu.utils.select_ipus(opts, indices)
Configure the IPUs to be used by the session.
The configuration describes a system consisting of multiple Tensorflow devices, each with control of one of more IPUs. The Tensorflow devices will be labeled
/device:IPU:0
,/device:IPU:1
and so on.Each Tensorflow device uses a specific configuration consisting of one or more IPUs from the list of devices. These can be found by running the Graphcore utility
gc-info -l
. For instance, the following listing shows the device configurations available on a system with 16 IPUs.user@host:~$ gc-info -l Graphcore device listing: -+- Id: [0], type: [PCIe], PCI Domain: [0000:1a:00.0] -+- Id: [1], type: [PCIe], PCI Domain: [0000:1b:00.0] -+- Id: [2], type: [PCIe], PCI Domain: [0000:23:00.0] -+- Id: [3], type: [PCIe], PCI Domain: [0000:24:00.0] -+- Id: [4], type: [PCIe], PCI Domain: [0000:3d:00.0] -+- Id: [5], type: [PCIe], PCI Domain: [0000:3e:00.0] -+- Id: [6], type: [PCIe], PCI Domain: [0000:43:00.0] -+- Id: [7], type: [PCIe], PCI Domain: [0000:44:00.0] -+- Id: [8], type: [PCIe], PCI Domain: [0000:8b:00.0] -+- Id: [9], type: [PCIe], PCI Domain: [0000:8c:00.0] -+- Id: [10], type: [PCIe], PCI Domain: [0000:8e:00.0] -+- Id: [11], type: [PCIe], PCI Domain: [0000:8f:00.0] -+- Id: [12], type: [PCIe], PCI Domain: [0000:b8:00.0] -+- Id: [13], type: [PCIe], PCI Domain: [0000:b9:00.0] -+- Id: [14], type: [PCIe], PCI Domain: [0000:ba:00.0] -+- Id: [15], type: [PCIe], PCI Domain: [0000:bb:00.0] -+- Id: [16], type: [Multi IPU] |--- PCIe Id: [5], DNC Id: [0], PCI Domain: [0000:3e:00.0] |--- PCIe Id: [7], DNC Id: [1], PCI Domain: [0000:44:00.0] -+- Id: [17], type: [Multi IPU] |--- PCIe Id: [4], DNC Id: [0], PCI Domain: [0000:3d:00.0] |--- PCIe Id: [6], DNC Id: [1], PCI Domain: [0000:43:00.0] -+- Id: [18], type: [Multi IPU] |--- PCIe Id: [3], DNC Id: [0], PCI Domain: [0000:24:00.0] |--- PCIe Id: [1], DNC Id: [1], PCI Domain: [0000:1b:00.0] -+- Id: [19], type: [Multi IPU] |--- PCIe Id: [2], DNC Id: [0], PCI Domain: [0000:23:00.0] |--- PCIe Id: [0], DNC Id: [1], PCI Domain: [0000:1a:00.0] -+- Id: [20], type: [Multi IPU] |--- PCIe Id: [13], DNC Id: [0], PCI Domain: [0000:b9:00.0] |--- PCIe Id: [15], DNC Id: [1], PCI Domain: [0000:bb:00.0] -+- Id: [21], type: [Multi IPU] |--- PCIe Id: [12], DNC Id: [0], PCI Domain: [0000:b8:00.0] |--- PCIe Id: [14], DNC Id: [1], PCI Domain: [0000:ba:00.0] -+- Id: [22], type: [Multi IPU] |--- PCIe Id: [9], DNC Id: [0], PCI Domain: [0000:8c:00.0] |--- PCIe Id: [11], DNC Id: [1], PCI Domain: [0000:8f:00.0] -+- Id: [23], type: [Multi IPU] |--- PCIe Id: [10], DNC Id: [0], PCI Domain: [0000:8e:00.0] |--- PCIe Id: [8], DNC Id: [1], PCI Domain: [0000:8b:00.0] -+- Id: [24], type: [Multi IPU] |--- PCIe Id: [5], DNC Id: [0], PCI Domain: [0000:3e:00.0] |--- PCIe Id: [7], DNC Id: [1], PCI Domain: [0000:44:00.0] |--- PCIe Id: [4], DNC Id: [2], PCI Domain: [0000:3d:00.0] |--- PCIe Id: [6], DNC Id: [3], PCI Domain: [0000:43:00.0] -+- Id: [25], type: [Multi IPU] |--- PCIe Id: [3], DNC Id: [0], PCI Domain: [0000:24:00.0] |--- PCIe Id: [1], DNC Id: [1], PCI Domain: [0000:1b:00.0] |--- PCIe Id: [2], DNC Id: [2], PCI Domain: [0000:23:00.0] |--- PCIe Id: [0], DNC Id: [3], PCI Domain: [0000:1a:00.0] -+- Id: [26], type: [Multi IPU] |--- PCIe Id: [13], DNC Id: [0], PCI Domain: [0000:b9:00.0] |--- PCIe Id: [15], DNC Id: [1], PCI Domain: [0000:bb:00.0] |--- PCIe Id: [12], DNC Id: [2], PCI Domain: [0000:b8:00.0] |--- PCIe Id: [14], DNC Id: [3], PCI Domain: [0000:ba:00.0] -+- Id: [27], type: [Multi IPU] |--- PCIe Id: [9], DNC Id: [0], PCI Domain: [0000:8c:00.0] |--- PCIe Id: [11], DNC Id: [1], PCI Domain: [0000:8f:00.0] |--- PCIe Id: [10], DNC Id: [2], PCI Domain: [0000:8e:00.0] |--- PCIe Id: [8], DNC Id: [3], PCI Domain: [0000:8b:00.0] -+- Id: [28], type: [Multi IPU] |--- PCIe Id: [5], DNC Id: [0], PCI Domain: [0000:3e:00.0] |--- PCIe Id: [7], DNC Id: [1], PCI Domain: [0000:44:00.0] |--- PCIe Id: [4], DNC Id: [2], PCI Domain: [0000:3d:00.0] |--- PCIe Id: [6], DNC Id: [3], PCI Domain: [0000:43:00.0] |--- PCIe Id: [3], DNC Id: [4], PCI Domain: [0000:24:00.0] |--- PCIe Id: [1], DNC Id: [5], PCI Domain: [0000:1b:00.0] |--- PCIe Id: [2], DNC Id: [6], PCI Domain: [0000:23:00.0] |--- PCIe Id: [0], DNC Id: [7], PCI Domain: [0000:1a:00.0] -+- Id: [29], type: [Multi IPU] |--- PCIe Id: [13], DNC Id: [0], PCI Domain: [0000:b9:00.0] |--- PCIe Id: [15], DNC Id: [1], PCI Domain: [0000:bb:00.0] |--- PCIe Id: [12], DNC Id: [2], PCI Domain: [0000:b8:00.0] |--- PCIe Id: [14], DNC Id: [3], PCI Domain: [0000:ba:00.0] |--- PCIe Id: [9], DNC Id: [4], PCI Domain: [0000:8c:00.0] |--- PCIe Id: [11], DNC Id: [5], PCI Domain: [0000:8f:00.0] |--- PCIe Id: [10], DNC Id: [6], PCI Domain: [0000:8e:00.0] |--- PCIe Id: [8], DNC Id: [7], PCI Domain: [0000:8b:00.0] -+- Id: [30], type: [Multi IPU] |--- PCIe Id: [5], DNC Id: [0], PCI Domain: [0000:3e:00.0] |--- PCIe Id: [7], DNC Id: [1], PCI Domain: [0000:44:00.0] |--- PCIe Id: [4], DNC Id: [2], PCI Domain: [0000:3d:00.0] |--- PCIe Id: [6], DNC Id: [3], PCI Domain: [0000:43:00.0] |--- PCIe Id: [3], DNC Id: [4], PCI Domain: [0000:24:00.0] |--- PCIe Id: [1], DNC Id: [5], PCI Domain: [0000:1b:00.0] |--- PCIe Id: [2], DNC Id: [6], PCI Domain: [0000:23:00.0] |--- PCIe Id: [0], DNC Id: [7], PCI Domain: [0000:1a:00.0] |--- PCIe Id: [13], DNC Id: [8], PCI Domain: [0000:b9:00.0] |--- PCIe Id: [15], DNC Id: [9], PCI Domain: [0000:bb:00.0] |--- PCIe Id: [12], DNC Id: [10], PCI Domain: [0000:b8:00.0] |--- PCIe Id: [14], DNC Id: [11], PCI Domain: [0000:ba:00.0] |--- PCIe Id: [9], DNC Id: [12], PCI Domain: [0000:8c:00.0] |--- PCIe Id: [11], DNC Id: [13], PCI Domain: [0000:8f:00.0] |--- PCIe Id: [10], DNC Id: [14], PCI Domain: [0000:8e:00.0] |--- PCIe Id: [8], DNC Id: [15], PCI Domain: [0000:8b:00.0]
Examples based on the listing above:
# Create a single device with 1 IPU at PCI address 0000:1a:00.0 by using # IPU configuration index 0 opts = create_ipu_config() opts = select_ipus(opts, indices=[0]) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
# Create a single device with 1 IPU at PCI address 0000:8b:00.0 by using # IPU configuration index 8 opts = create_ipu_config() opts = select_ipus(opts, indices=[8]) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
# Create two TensorFlow devices, with one IPU each, being devices at # indices 0 and 1 opts = create_ipu_config() opts = select_ipus(opts, indices=[0, 1]) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
# Create two TensorFlow devices, with four IPUs each. The device # configurations at indices 24 (0000:3e:00.0, 0000:44:00.0, 0000:3d:00.0, # 000:43:00.0) and 25 (0000:24:00.0, 0000:1b:00.0, 0000:23:00.0, # 00:1a:00.0) opts = create_ipu_config() opts = select_ipus(opts, indices=[24, 25]) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
# Create four TensorFlow devices each with one IPU, at addresses # 0000:1a:00.0, 0000:1b:00.0, 0000:23:00.0, 0000:24:00.0. opts = create_ipu_config() opts = select_ipus(opts, indices=[0, 1, 2, 3]) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
indices – List of IPU configuration indices.
- Returns
The IpuOptions configuration protobuf, with a number of devices selected by IPU configuration index.
- tensorflow.python.ipu.utils.set_compilation_options(opts, compilation_options=None)
Set the IPU compilation options for the session.
# Create a device with debug execution profile flag set to "compute_sets" opts = create_ipu_config() opts = set_compilation_options(opts, compilation_options={"debug.instrument": "true", "debug.allowOutOfMemory": "true"}) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
compilation_options – A dictionary of poplar compilation option flags to be sent to the executor.
- Returns
The IpuOptions configuration protobuf, with engine compilation options set.
- tensorflow.python.ipu.utils.set_convolution_options(opts, convolution_options=None)
Set the IPU convolution options for the session.
# Set "availableMemoryProportion" flag to "0.1" opts = create_ipu_config() opts = set_convolution_options(opts, convolution_options={"availableMemoryProportion": "0.1"}) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
convolution_options – A dictionary of poplar option flags for convolutions. The “availableMemoryProportion” flag indicates the proportion of tile memory to be made available as temporary memory for convolutions (float between 0 and 1.0). Less temporary memory will generally result in a convolution that takes more cycles to complete. However, because always live memory (such as control code and vertex state) is not tracked when planning it, a convolution using less temporary memory may use more memory overall, due to an increase of always live memory.
- Returns
The IpuOptions configuration protobuf, with convolution options set.
- tensorflow.python.ipu.utils.set_experimental_multi_replica_distribution_options(opts, process_count, process_index)
This will use the Poplar runtime replica subset feature to let multiple processes collaborate on executing the same Poplar program by executing a subset of the global replicas each.
The total global replication factor will be equal to the local replication factor multiplied by the
process_count
.WARNING: This API is experimental and subject to change.
- Parameters
process_count – The total number of processes.
process_index – The index of the current process.
- Returns
The IpuOptions configuration protobuf.
- tensorflow.python.ipu.utils.set_floating_point_behaviour_options(opts, inv=True, div0=True, oflo=True, esr=True, nanoo=True)
Set the IPU floating point control behaviour bits
See the Poplar API documentation for poplar::FloatingPointBehaviour.
- Parameters
inv – If true a floating point invalid operation (defined by IEEE 754) will cause an exception.
div0 – If true a floating point divide by zero operation will cause an exception.
oflo – If true a floating point overflow will cause an exception.
esr – Enable stochastic rounding.
nanoo – Enable Not-a-Number on overflow mode.
- tensorflow.python.ipu.utils.set_gcl_options(opts, num_io_tiles=0, gcl_options=None)
Set the IPU options for the Graphcore Communication Library.
- Parameters
num_io_tiles – Number of tiles to reserve per IPU for the GCL collective operations.
gcl_options – A dictionary with options for configuring the GCL collective operations.
- Returns
The IpuOptions configuration protobuf.
- tensorflow.python.ipu.utils.set_ipu_connection_type(opts, connection_type=None, ipu_version=None)
- Configure when to attach to the device. For example, you can use
this to compile and cache a program without attaching to an IPU, and then later run on a real IPU device without recompiling. Setting the connection type doesn’t impact the ability to profile a model.
# Compile without attaching to the device. opts = create_ipu_config() opts = set_ipu_connection_type(opts, DeviceConnectionType.ON_DEMAND)) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
connection_type – One of
DeviceConnectionType
. Defaults toDeviceConnectionType.ALWAYS
if None.ipu_version – Version of the IPU hardware used (int). E.g. 1 for Mk1 and 2 for Mk2. Required if the
connection_type
provided isDeviceConnectionType.NEVER
.
- Returns
The IpuOptions configuration protobuf.
- tensorflow.python.ipu.utils.set_ipu_model_options(opts, compile_ipu_code=True, tiles_per_ipu=None)
Set the IPU Model options.
- Parameters
compile_ipu_code – Whether or not to actually compile real IPU code for modelling.
tiles_per_ipu – The number of tiles per IPU Model device.
- Returns
The IpuOptions configuration protobuf, with IPU model options set.
- tensorflow.python.ipu.utils.set_matmul_options(opts, matmul_options=None, clear_pass_type=False)
Set the IPU matrix multiplication options for the session.
# Set "availableMemoryProportion" flag to "0.5" opts = create_ipu_config() opts = set_matmul_options(opts, matmul_options={"availableMemoryProportion": "0.5"}) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
matmul_options – A dictionary containing the poplar option flag “availableMemoryProportion” for the matrix multiplication operations. It indicates the proportion of tile memory to be made available as temporary memory for the matrix multiplications (float between 0 and 1.0). Less temporary memory will generally result in a multiplication that takes more cycles to complete. However, because always live memory (like code and vertex state) is not tracked when planning it, a multiplication using less temporary memory may use more memory overall, due to an increase of always live memory.
clear_pass_type – When set to True, the Pass type will not be set in the options passed to the poplar operation.
- Returns
The IpuOptions configuration protobuf, with matmul options set.
- tensorflow.python.ipu.utils.set_norm_options(opts, use_stable_statistics=False)
Set the IPU options related to norms.
- Parameters
use_stable_statistics – If True, computes the mean first and subtracts the activations by it before computing the variance. The implementation with this flag set to True is slower than when set to False.
- Returns
The IpuOptions configuration protobuf.
- tensorflow.python.ipu.utils.set_optimization_options(opts, combine_embedding_lookups=False, combine_matmuls=False, max_cross_replica_sum_buffer_size=0, max_reduce_scatter_buffer_size=0, max_inter_ipu_copies_buffer_size=0, max_send_recv_cluster_size=0, minimum_remote_tensor_size=128, gather_simplifier=False, triangular_solve_expander_block_size=0, enable_fast_math=False)
Set the IPU options related to performance / optimizations.
# Create a device with fusion for multiSlices sharing the same input # enabled. opts = create_ipu_config() opts = set_optimization_options(opts, combine_embedding_lookups=True) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
combine_embedding_lookups – Fuse embedding lookups on the same tensor. This might improve performance but increase memory usage.
combine_matmuls – Fuse matmul operations if they share the same weights or the same input.
max_cross_replica_sum_buffer_size – The maximum number of bytes that can be waiting before a cross replica sum op is scheduled.
max_reduce_scatter_buffer_size – The maximum number of bytes that can be waiting before a reduce scatter op is scheduled.
max_inter_ipu_copies_buffer_size – The maximum number of bytes that can be waiting before a inter IPU copy between IPUs is scheduled.
max_send_recv_cluster_size – The maximum number of bytes that can be waiting before a cluster of send/recv instructions to/from the host is scheduled. These are lowered to stream copies that can be merged by Poplar.
minimum_remote_tensor_size – The minimum size (in bytes) a tensor has to be in order to be consider for being stored in remote memory.
gather_simplifier – Will enable more aggressive optimisation for embedding lookups.
triangular_solve_expander_block_size – Defines size for triangular solver expander blocks. 0 - implementation defined default.
enable_fast_math – Enables optimizations which allow arbitrary reassociations and transformations of mathemtical operations with no accuracy guarantees. Enabling this option can result in incorrect output for programs that depend on an exact implementation of IEEE for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
- Returns
The IpuOptions configuration protobuf.
- tensorflow.python.ipu.utils.set_pooling_options(opts, pooling_options=None)
Set the IPU pooling compilation options for the session.
# Set "poolUseIntrospectiveMapping" flag to "false" opts = create_ipu_config() opts = set_pooling_options(opts, pooling_options={"poolUseIntrospectiveMapping": "false"}) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
pooling_options – A dictionary of poplar option flags for the pooling operation.
- Returns
The IpuOptions configuration protobuf, with pooling options set.
- tensorflow.python.ipu.utils.set_recomputation_options(opts, allow_recompute=True, allow_stateful_recompute=None)
Set re-computation options. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(allow_stateful_recompute)
. They will be removed in a future version. Instructions for updating: Pipelining recomputation will recompute all the non-stateful operations when recomputation is enabled.- Parameters
allow_recompute – Whether or not to re-compute instructions during training. If this is enabled then we will attempt to pattern match instructions/pipeline stages in the forward pass and recompute them in the backward pass to avoid having to preserve activations which increase the maximum memory liveness. Enabling this option can reduce memory usage at the expense of extra computation. Any stateful operations cannot be recomputed.
allow_stateful_recompute – Deprecated.
- Returns
The IpuOptions configuration protobuf.
- tensorflow.python.ipu.utils.set_report_options(opts, report_options=None, graph_options=None, execution_options=None)
Set the options used to influence Poplar graph and execution reports (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(report_options)
. They will be removed in a future version. Instructions for updating: report_options is deprecated, use graph_options and execution_options insteadgeneration.
opts = create_ipu_config() opts = set_report_options(opts, report_options={"reportOption1": "false"}, graph_options={"graphOptions": "false"}, execution_options={"executionOptions": "false"}) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
report_options – (Deprecated) A dictionary of poplar option flags for the report generation.
graph_options – A dictionary of poplar option flags for the graph report generation.
execution_options – A dictionary of poplar option flags for the execution report generation.
- Returns
The IpuOptions configuration protobuf, with convolution options set.
- tensorflow.python.ipu.utils.set_serialization_options(opts, output_folder='')
Enable / disable the serialization to disk of the compiled executables.
# Create a device that will save to disk all the compiled executables. opts = create_ipu_config() opts = set_serialization_options(opts, output_folder="/tmp/my_network") ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
output_folder – Where to save the compiled executables. Set to “” to disable serialization.
- Returns
The IpuOptions configuration protobuf.
- tensorflow.python.ipu.utils.set_transfer_options(opts, use_verified_transfers=False)
Set the IPU options related to Poplar data transfers.
- Parameters
opts – An IpuOptions session control protobuf.
use_verified_transfers – If True, use Poplar’s verified transfers.
- Returns
The IpuOptions configuration protobuf.
- tensorflow.python.ipu.utils.set_verification_options(opts, verification_options)
- Set the pairs or key / id to use for each type of data used in the graph
when verified transfers are enabled.
# Create a device which will use verified transfers with different keys. opts = create_ipu_config() opts = set_transfer_options(opts, use_verified_transfers=True) o = VerificationOptions() o.input_parameters = KeyId(1) o.infeeds["training_feed"] = KeyId(2) opts = set_verification_options(opts, o) ipu.utils.configure_ipu_system(opts) with tf.Session() as s: ...
- Parameters
opts – An IpuOptions session control protobuf.
verification_options – a VerificationOptions object that contains the keys / ids to use.
14.7. Looping utilities
- tensorflow.python.ipu.loops.repeat(n, body, inputs=None, infeed_queue=None, use_while_v1=True)
Builds a loop that executes a fixed number of iterations.
The set of loop-carried tensors correspond to
inputs
.body
must be a function that takes and returns the values of the loop-carried tensors.- Parameters
n – the number of loop iterations
body – a Python function that builds the loop body.
inputs – a list of initial values passed into the loop or None (equivalent to an empty list).
infeed_queue – if not None, the IPUInfeedQueue from which data is consumed.
use_while_v1 – if True, then use a Tensorflow v1.x dataflow while loop.
- Returns
The final values of the loop-carried tensors.
- Raises
ValueError – if there is a type error.
TypeError – if body has the wrong signature.
- tensorflow.python.ipu.loops.while_loop(condition, body, inputs=None, infeed_queue=None, maximum_iterations=None, use_while_v1=True)
Builds a while loop for IPUs.
The set of loop-carried tensors corresponds to
inputs
. Bothcondition
andbody
take the current value of the loop-carried tensors.condition
must return a single boolean value that determines whether iteration continues.body
must return an updated list of values for the loop-carried tensors.- Parameters
condition – a Python function that builds the loop condition.
body – a Python function that builds the loop body.
inputs – a list of initial values passed into the loop, or None (equivalent to an empty list).
infeed_queue – if not None, the IPUInfeedQueue from which data is consumed.
use_while_v1 – if True, then use a Tensorflow v1.x dataflow while loop.
- Returns
The final values of the loop-carried tensors.
- Raises
TypeError – if body or condition has the wrong signature.
14.8. Distributed training
- class tensorflow.python.ipu.ipu_multi_worker_strategy.IPUMultiWorkerStrategy(cluster_resolver, ipu_device='/device:IPU:0', variables_on_host=False)
This is a distribution strategy for synchronous training using IPUs on multiple workers with between-graph replication.
By default variables and ops are placed on the IPU of each worker, but variables can optionally be placed on the host by setting
variables_on_host=True
. In any case, this strategy will make sure that variables are kept in sync between the workers by performing multi-worker reductions.The multi-worker reductions are done using TensorFlow’s implementation of collective operations over gRPC.
Variable synchronization
The default behavior is to sync (allreduce) the variables when they are written (sync-on-write). This is a good choice when reads are at least as common as writes. However, for variables where writes are more common than reads (like metrics or population statistics in batch normalization layers), it is beneficial to only sync (allreduce) the variables when they are read (sync-on-read).
In both cases, it is important that all the workers participate in the sync, otherwise progress will be blocked. Take special care in the latter case (with sync-on-read variables), because it implies that all the workers need to read these variables at the same time. For example, it implies that all the workers must checkpoint the model at the same time.
Sync-on-read variables are placed on the IPU even when variables were requested placed on the host (with
variables_on_host=True
), because it allows the ops to update the variables directly on the IPU without any host involvement. Only when the variable is read, it is streamed to the host and allreduced there.Weight updates
When used during training with an
Optimizer
, there is an implicit allreduce in theoptimizer.apply_gradients()
function (which is called fromoptimizer.minimize()
). This will automatically cause the gradients to be streamed to the host of each worker, allreduced between the workers, and then streamed back to the IPU of each worker, where identical weight updates are performed (keeping the workers in sync). This is done even when the call tooptimizer.apply_gradients()
is inside a function passed toipu_compiler.compile()
, as the allreduce is extracted from the compiled XLA cluster and placed on the host in the outside graph (by internally using anoutside_compilation_scope()
).When variables are placed on the host, the weight updates should also be placed on the host. In other words, the
optimizer.compute_gradients()
call should be placed on the IPU, while theoptimizer.apply_gradients()
call should be placed on the host. This must be done explicitly. In this scenario all the “slot” variables used by the optimizer (e.g. the momentum accumulator) are then also kept only in host memory and never used on the IPU, saving IPU memory.Compatibility
IPUEstimator
: Pass theIPUMultiWorkerStrategy
instance to theRunConfig
as thetrain_distribute
argument. When variables are placed on the host, theoptimizer.apply_gradients()
call should also be placed on the host by using theIPUEstimatorSpec
host_call
argument. See full example: Distributed training.IPUPipelineEstimator
: Pass theIPUMultiWorkerStrategy
instance to theRunConfig
as thetrain_distribute
argument. Placing variables on the host is not currently supported here.Keras
Model.fit
: Not currently supported.Custom training loop: Pass the training step function to
IPUMultiWorkerStrategy.experimental_run_v2()
. With variables on the IPU, theoptimizer.apply_gradients()
call can be done from an XLA compiled IPU function, and the inter-host allreduce will be automatically extracted from the compiled XLA cluster and placed on the host. With variables on the host, theoptimizer.apply_gradients()
call must be explicitly placed on the host.Example using a custom training loop with pipelining
cluster_resolver = tf.distribute.cluster_resolver.TFConfigClusterResolver() strategy = IPUMultiWorkerStrategy(cluster_resolver) sess_config = tf.ConfigProto() sess_config = strategy.update_config_proto(sess_config) server = tf.distribute.Server(cluster_resolver.cluster_spec(), job_name=cluster_resolver.task_type, task_index=cluster_resolver.task_id, config=sess_config) sess_target = server.target with strategy.scope(): infeed_queue = ipu_infeed_queue.IPUInfeedQueue(dataset, "infeed") outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue("outfeed") def stage1(lr, images, labels): partial = keras.layers.Dense(256, activation="relu")(images) partial = keras.layers.Dense(128, activation="relu")(partial) return lr, partial, labels def stage2(lr, partial, labels): logits = keras.layers.Dense(10)(partial) per_example_loss = keras.losses.sparse_categorical_crossentropy( y_true=labels, y_pred=logits, from_logits=True) # In a custom training loop, the optimiser does an allreduce *sum*, not # average, of the gradients across the distributed workers. Therefore # we want to divide the loss here by the *global* batch size, which is # done by the `tf.nn.compute_average_loss()` function. loss = nn.compute_average_loss(per_example_loss) return lr, loss def optimizer_function(lr, loss): optimizer = GradientDescentOptimizer(lr) return pipelining_ops.OptimizerFunctionOutput(optimizer, loss) def model(lr): pipeline_op = pipelining_ops.pipeline( computational_stages=[stage1, stage2], gradient_accumulation_count=gradient_accumulation_count, inputs=[lr], infeed_queue=infeed_queue, outfeed_queue=outfeed_queue, optimizer_function=optimizer_function, name="Pipeline") return pipeline_op def compiled_model(lr): with ipu_scope("/device:IPU:0"): return ipu_compiler.compile(model, inputs=[lr]) with ops.device("cpu"): lr = array_ops.placeholder(np.float32, []) train_op = strategy.experimental_run_v2(compiled_model, args=[lr]) _, per_worker_losses = outfeed_queue.dequeue() # Mean across the local `gradient_accumulation_count` batches: per_worker_loss = math_ops.reduce_mean(per_worker_losses) # Global mean across the distributed workers (since it is already # divided by the global batch size above, we do a sum here): global_loss = strategy.reduce(ReduceOp.SUM, per_worker_loss) config = ipu_utils.create_ipu_config() config = ipu_utils.auto_select_ipus(config, num_ipus=2) ipu_utils.configure_ipu_system(config) ipu_utils.move_variable_initialization_to_cpu() with session_lib.Session(target=sess_target, config=sess_config) as sess: sess.run(infeed_queue.initializer) sess.run(variables.global_variables_initializer()) for _ in range(10): sess.run(train_op, {lr: 0.01}) global_loss_val = sess.run(global_loss)
14.9. Datasets
14.9.1. Dataset benchmarking
- tensorflow.python.ipu.dataset_benchmark.dataset_benchmark(dataset, number_of_epochs, elements_per_epochs, print_stats=True, apply_options=True)
Allows the user to benchmark performance of a
tf.data.Dataset
.- Parameters
dataset – An instance of
tf.data.Dataset
which will be benchmarked.number_of_epochs – The number of epochs this dataset will be run for.
elements_per_epochs – The number of elements there are in each epoch.
print_stats – Whether to print statistics about the performance to the console.
apply_options – Whether to apply optimization options which can improve the dataset performance.
- Returns
A JSON string with performance statistics, which records the following metrics every epoch:
elements_processed
- number of elements processed.total_bytes_processed
- total number of bytes which was processed.time_elapsed
- the time it took (in seconds) for the epoch to complete.elements_per_second
- number of elements processed per second.bandwidth
- the bandwidth achieved, measured in GB/s.
The JSON string returned can be parsed into a native Python JSON library (see https://docs.python.org/3/library/json.html).
- Raises
TypeError – if
dataset
is not an instance oftf.data.Dataset
.ValueError – if
number_of_epochs
orelements_per_epochs
is less than 1.
- tensorflow.python.ipu.dataset_benchmark.infeed_benchmark(infeed_queue, number_of_epochs, elements_per_epochs, print_stats=True)
Allows the user to benchmark performance of an
ipu.ipu_infeed_queue.IPUInfeedQueue
.- Parameters
infeed_queue – An instance of
ipu.ipu_infeed_queue.IPUInfeedQueue
which will be benchmarked.number_of_epochs – The number of epochs this infeed queue will be run for.
elements_per_epochs – The number of elements there are in each epoch.
print_stats – Whether to print statistics about the performance to the console.
- Returns
A JSON string with performance statistics, which records the following metrics every epoch:
elements_processed
- number of elements processed.total_bytes_processed
- total number of bytes which was processed.time_elapsed
- the time it took (in seconds) for the epoch to complete.elements_per_second
- number of elements processed per second.bandwidth
- the bandwidth achieved, measured in GB/s.
The JSON string returned can be parsed into a native Python JSON library (see https://docs.python.org/3/library/json.html).
- Raises
TypeError – if
infeed_queue
is not an instance ofipu.ipu_infeed_queue.IPUInfeedQueue
.ValueError – if
number_of_epochs
orelements_per_epochs
is less than 1.
14.9.2. Dataset wrappers
- class tensorflow.python.ipu.data.ops.dataset_ops.BufferDataset(input_dataset, buffer_size)
A
Dataset
which makes sure there is a multiple ofbuffer_size
number of elements available.- __init__(input_dataset, buffer_size)
- A
Dataset
which makes sure there is a multiple ofbuffer_size
number of elements available.
- Parameters
input_dataset – The input dataset.
buffer_size – The number of dataset elements which will be available.
- A
14.10. Estimators
14.10.1. IPUEstimator
- class tensorflow.python.ipu.ipu_estimator.IPUEstimator(model_fn, model_dir=None, config=None, params=None, warm_start_from=None, train_batch_size=None, eval_batch_size=None, predict_batch_size=None)
Estimator with IPU support.
IPUEstimator handles many of the details of running on IPUs, such as placement of operations and tensors, graph compilation and usage of data feeds. It also provides a simple way to use multiple IPUs in the form of either data parallelism or model parallelism.
The data parallelism is based on graph replication. One batch from the dataset returned by the
input_fn
(of sizebatch_size
) is sent to each replica, giving an effective batch size ofnum_replicas * batch_size
. The only change needed to themodel_fn
is that the optimizer should be wrapped in aCrossReplicaOptimizer
in order to average the gradients across the replicas.This can also be combined with distributed multi-worker training using the
IPUMultiWorkerStrategy
, giving a total effective batch size ofnum_workers * num_replicas * batch_size
.The desired global batch size can be passed as
train_batch_size
,eval_batch_size
andpredict_batch_size
, and the local batch size will be calculated based on the number of replicas and the number of distributed workers and passed to theinput_fn
andmodel_fn
inparams['batch_size']
. If theinput_fn
returns a dataset batched withdataset.batch(params['batch_size'], drop_remainder=True)
, the global batch size will be as desired.The model parallelism supported by this class is basic sharding. Consider using the
IPUPipelineEstimator
to get pipelined execution.For efficiency, it supports compiling a graph that contains multiple iterations of the training/prediction/evaluation loop, which will be fully executed on the IPU before yielding back to the TensorFlow Python runtime on the CPU.
See https://tensorflow.org/guide/estimators for general information about estimators.
- Parameters
model_fn – The model function. Refer to https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/custom_estimators.md#write-a-model-function for details on how to write this function.
model_dir – Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into an estimator to continue training a previously saved model. If
PathLike
object, the path will be resolved. IfNone
, the model_dir inconfig
will be used if set. If both are set, they must be same. If both areNone
, a temporary directory will be used.config – A
RunConfig
object.params –
dict
of hyper parameters that will be passed intomodel_fn
. Keys are names of parameters, values are basic python types.warm_start_from – Optional string filepath to a checkpoint or SavedModel to warm-start from, or a
tf.estimator.WarmStartSettings
object to fully configure warm-starting. If the string filepath is provided instead of atf.estimator.WarmStartSettings
, then all variables are warm-started, and it is assumed that vocabularies andtf.Tensor
names are unchanged.train_batch_size – If not None, an int representing the global training batch size. This global batch size is transformed to a local batch size passed as
params['batch_size']
to theinput_fn
andmodel_fn
during training. Must be divisible by the number of replicas multiplied by the number of distributed workers.eval_batch_size – If not None, an int representing the global evaluation batch size. Same behaviour as train_batch_size, only during evaluation.
predict_batch_size – If not None, an int representing the global prediction batch size. Same behaviour as train_batch_size, only during prediction.
- eval_dir(name=None)
Shows the directory name where evaluation metrics are dumped.
- Parameters
name – Name of the evaluation if user needs to run multiple evaluations on different data sets, such as on training data vs test data. Metrics for different evaluations are saved in separate folders, and appear separately in tensorboard.
- Returns
A string which is the path of directory contains evaluation metrics.
- evaluate(input_fn, steps=None, hooks=None, checkpoint_path=None, name=None)
Evaluates the model given evaluation data
input_fn
.- Parameters
input_fn –
A function that constructs the input data for evaluation. The function should return a
tf.data.Dataset
object. The outputs of theDataset
object must be a tuple(features, labels)
wherefeatures
is atf.Tensor
or a dictionary of string feature name toTensor
labels
is aTensor
or a dictionary of string label name toTensor
Both
features
andlabels
are consumed bymodel_fn
.steps – Number of steps for which to evaluate model.
hooks – List of
tf.train.SessionRunHook
subclass instances. Used for callbacks inside the evaluation call.checkpoint_path – Path of a specific checkpoint to evaluate. If
None
, the latest checkpoint inmodel_dir
is used. If there are no checkpoints inmodel_dir
, evaluation is run with newly initializedVariables
instead of ones restored from checkpoint.name – Name of the evaluation if user needs to run multiple evaluations on different data sets, such as on training data vs test data. Metrics for different evaluations are saved in separate folders, and appear separately in tensorboard.
- Returns
A dict containing the evaluation metrics specified in
model_fn
keyed by name, as well as an entryglobal_step
which contains the value of the global step for which this evaluation was performed.
- experimental_export_all_saved_models(export_dir_base, input_receiver_fn_map, assets_extra=None, as_text=False, checkpoint_path=None)
Exports a
SavedModel
withtf.MetaGraphDefs
for each requested mode.For each mode passed in via the
input_receiver_fn_map
, this method builds a new graph by calling theinput_receiver_fn
to obtain feature and labelTensor`s. Next, this method calls the `Estimator
’smodel_fn
in the passed mode to generate the model graph based on those features and labels, and restores the given checkpoint (or, lacking that, the most recent checkpoint) into the graph. Only one of the modes is used for saving variables to theSavedModel
(order of preference:tf.estimator.ModeKeys.TRAIN
,tf.estimator.ModeKeys.EVAL
, thentf.estimator.ModeKeys.PREDICT
), such that up to threetf.MetaGraphDefs
are saved with a single set of variables in a singleSavedModel
directory.For the variables and
tf.MetaGraphDefs
, a timestamped export directory belowexport_dir_base
, and writes aSavedModel
into it containing thetf.MetaGraphDef
for the given mode and its associated signatures.For prediction, the exported
MetaGraphDef
will provide oneSignatureDef
for each element of theexport_outputs
dict returned from themodel_fn
, named using the same keys. One of these keys is alwaystf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
, indicating which signature will be served when a serving request does not specify one. For each signature, the outputs are provided by the correspondingtf.estimator.export.ExportOutput`s, and the inputs are always the input receivers provided by the `serving_input_receiver_fn
.For training and evaluation, the
train_op
is stored in an extra collection, and loss, metrics, and predictions are included in aSignatureDef
for the mode in question.Extra assets may be written into the
SavedModel
via theassets_extra
argument. This should be a dict, where each key gives a destination path (including the filename) relative to the assets.extra directory. The corresponding value gives the full path of the source file to be copied. For example, the simple case of copying a single file without renaming it is specified as{'my_asset_file.txt': '/path/to/my_asset_file.txt'}
.- Parameters
export_dir_base – A string containing a directory in which to create timestamped subdirectories containing exported `SavedModel`s.
input_receiver_fn_map – dict of
tf.estimator.ModeKeys
toinput_receiver_fn
mappings, where theinput_receiver_fn
is a function that takes no arguments and returns the appropriate subclass ofInputReceiver
.assets_extra – A dict specifying how to populate the assets.extra directory within the exported
SavedModel
, orNone
if no extra assets are needed.as_text – whether to write the
SavedModel
proto in text format.checkpoint_path – The checkpoint path to export. If
None
(the default), the most recent checkpoint found within the model directory is chosen.
- Returns
The string path to the exported directory.
- Raises
ValueError – if any
input_receiver_fn
isNone
, noexport_outputs
are provided, or no checkpoint can be found.
- export_saved_model(export_dir_base, serving_input_receiver_fn, assets_extra=None, as_text=False, checkpoint_path=None, experimental_mode='infer')
Exports inference graph as a
SavedModel
into the given dir.For a detailed guide, see [Using SavedModel with Estimators](https://tensorflow.org/guide/saved_model#using_savedmodel_with_estimators).
This method builds a new graph by first calling the
serving_input_receiver_fn
to obtain featureTensor`s, and then calling this `Estimator
’smodel_fn
to generate the model graph based on those features. It restores the given checkpoint (or, lacking that, the most recent checkpoint) into this graph in a fresh session. Finally it creates a timestamped export directory below the givenexport_dir_base
, and writes aSavedModel
into it containing a singletf.MetaGraphDef
saved from this session.The exported
MetaGraphDef
will provide oneSignatureDef
for each element of theexport_outputs
dict returned from themodel_fn
, named using the same keys. One of these keys is alwaystf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
, indicating which signature will be served when a serving request does not specify one. For each signature, the outputs are provided by the correspondingtf.estimator.export.ExportOutput`s, and the inputs are always the input receivers provided by the `serving_input_receiver_fn
.Extra assets may be written into the
SavedModel
via theassets_extra
argument. This should be a dict, where each key gives a destination path (including the filename) relative to the assets.extra directory. The corresponding value gives the full path of the source file to be copied. For example, the simple case of copying a single file without renaming it is specified as{'my_asset_file.txt': '/path/to/my_asset_file.txt'}
.The experimental_mode parameter can be used to export a single train/eval/predict graph as a
SavedModel
. Seeexperimental_export_all_saved_models
for full docs.- Parameters
export_dir_base – A string containing a directory in which to create timestamped subdirectories containing exported `SavedModel`s.
serving_input_receiver_fn – A function that takes no argument and returns a
tf.estimator.export.ServingInputReceiver
ortf.estimator.export.TensorServingInputReceiver
.assets_extra – A dict specifying how to populate the assets.extra directory within the exported
SavedModel
, orNone
if no extra assets are needed.as_text – whether to write the
SavedModel
proto in text format.checkpoint_path – The checkpoint path to export. If
None
(the default), the most recent checkpoint found within the model directory is chosen.experimental_mode –
tf.estimator.ModeKeys
value indicating with mode will be exported. Note that this feature is experimental.
- Returns
The string path to the exported directory.
- Raises
ValueError – if no
serving_input_receiver_fn
is provided, noexport_outputs –
- export_savedmodel(export_dir_base, serving_input_receiver_fn, assets_extra=None, as_text=False, checkpoint_path=None, strip_default_attrs=False)
Exports inference graph as a
SavedModel
into the given dir. (deprecated)Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: This function has been renamed, use
export_saved_model
instead.For a detailed guide, see [Using SavedModel with Estimators](https://tensorflow.org/guide/saved_model#using_savedmodel_with_estimators).
This method builds a new graph by first calling the
serving_input_receiver_fn
to obtain featureTensor`s, and then calling this `Estimator
’smodel_fn
to generate the model graph based on those features. It restores the given checkpoint (or, lacking that, the most recent checkpoint) into this graph in a fresh session. Finally it creates a timestamped export directory below the givenexport_dir_base
, and writes aSavedModel
into it containing a singletf.MetaGraphDef
saved from this session.The exported
MetaGraphDef
will provide oneSignatureDef
for each element of theexport_outputs
dict returned from themodel_fn
, named using the same keys. One of these keys is alwaystf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
, indicating which signature will be served when a serving request does not specify one. For each signature, the outputs are provided by the correspondingtf.estimator.export.ExportOutput`s, and the inputs are always the input receivers provided by the `serving_input_receiver_fn
.Extra assets may be written into the
SavedModel
via theassets_extra
argument. This should be a dict, where each key gives a destination path (including the filename) relative to the assets.extra directory. The corresponding value gives the full path of the source file to be copied. For example, the simple case of copying a single file without renaming it is specified as{'my_asset_file.txt': '/path/to/my_asset_file.txt'}
.- Parameters
export_dir_base – A string containing a directory in which to create timestamped subdirectories containing exported `SavedModel`s.
serving_input_receiver_fn – A function that takes no argument and returns a
tf.estimator.export.ServingInputReceiver
ortf.estimator.export.TensorServingInputReceiver
.assets_extra – A dict specifying how to populate the assets.extra directory within the exported
SavedModel
, orNone
if no extra assets are needed.as_text – whether to write the
SavedModel
proto in text format.checkpoint_path – The checkpoint path to export. If
None
(the default), the most recent checkpoint found within the model directory is chosen.strip_default_attrs – Boolean. If
True
, default-valued attributes will be removed from the `NodeDef`s. For a detailed guide, see [Stripping Default-Valued Attributes]( https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md#stripping-default-valued-attributes).
- Returns
The string path to the exported directory.
- Raises
ValueError – if no
serving_input_receiver_fn
is provided, noexport_outputs –
- get_variable_names()
Returns list of all variable names in this model.
- Returns
List of names.
- Raises
ValueError – If the
Estimator
has not produced a checkpoint yet.
- get_variable_value(name)
Returns value of the variable given by name.
- Parameters
name – string or a list of string, name of the tensor.
- Returns
Numpy array - value of the tensor.
- Raises
ValueError – If the
Estimator
has not produced a checkpoint yet.
- latest_checkpoint()
Finds the filename of the latest saved checkpoint file in
model_dir
.- Returns
The full path to the latest checkpoint or
None
if no checkpoint was found.
- property model_fn
Returns the
model_fn
which is bound toself.params
.- Returns
def model_fn(features, labels, mode, config)
- Return type
The
model_fn
with following signature
- predict(input_fn, predict_keys=None, hooks=None, checkpoint_path=None, yield_single_examples=True, num_predictions=None)
Yields predictions for given features.
- Parameters
input_fn –
A function that constructs the features. The function should return a
tf.data.Dataset
object. The outputs of theDataset
object should be one of the following:features: A
Tensor
or a dictionary of string feature name toTensor
. features are consumed bymodel_fn
.A tuple, in which case the first item is extracted as features.
predict_keys – list of
str
, name of the keys to predict. It is used if thetf.estimator.EstimatorSpec.predictions
is adict
. Ifpredict_keys
is used then rest of the predictions will be filtered from the dictionary. IfNone
, returns all.hooks – List of
tf.train.SessionRunHook
subclass instances. Used for callbacks inside the prediction call.checkpoint_path – Path of a specific checkpoint to predict. If
None
, the latest checkpoint inmodel_dir
is used. If there are no checkpoints inmodel_dir
, prediction is run with newly initializedVariables
instead of ones restored from checkpoint.yield_single_examples – If
False
, yields the whole batch as returned by themodel_fn
instead of decomposing the batch into individual elements. This is useful ifmodel_fn
returns some tensors whose first dimension is not equal to the batch size.num_predictions – If not
None
, the generator will raiseStopIteration
after yielding this number of predictions. This allows draining the generator by usinglist(predictions)
. IfNone
, the returned generator is infinite and will trigger a fatal error if you try to consume more predictions from it than what is actually generated, instead of raising theStopIteration
exception. This is caused by the current behaviour when requesting to run a loop on the IPU for more iterations than there are elements remaining in the dataset. In this case you cannot drain it by usinglist(predictions)
, you have to consume the expected number of elements yourself, e.g. using[next(predictions) for _ in range(num_predictions)]
.
- Yields
Evaluated values of
predictions
tensors.
- train(input_fn, hooks=None, steps=None, max_steps=None, saving_listeners=None)
Trains a model given training data
input_fn
.- Parameters
input_fn –
A function that provides input data for training as minibatches. The function should return a
tf.data.Dataset
object. The outputs of theDataset
object must be a tuple(features, labels)
wherefeatures
is atf.Tensor
or a dictionary of string feature name toTensor
labels
is aTensor
or a dictionary of string label name toTensor
Both
features
andlabels
are consumed bymodel_fn
.hooks – List of
tf.train.SessionRunHook
subclass instances. Used for callbacks inside the training loop.steps – Number of steps for which to train the model.
steps
works incrementally. If you call two timestrain(steps=10)
then training occurs in total 20 steps. If you don’t want to have incremental behavior please setmax_steps
instead. If set,max_steps
must beNone
.max_steps – Number of total steps for which to train model. If set,
steps
must beNone
. Two calls totrain(steps=100)
means 200 training iterations. On the other hand, two calls totrain(max_steps=100)
means that the second call will not do any iteration since first call did all 100 steps.saving_listeners – list of
CheckpointSaverListener
objects. Used for callbacks that run immediately before or after checkpoint savings.
- Returns
self
, for chaining.
- class tensorflow.python.ipu.ipu_estimator.IPUEstimatorSpec(mode, predictions=None, loss=None, train_op=None, eval_metric_ops=None, eval_metrics=None, host_call=None, training_hooks=None, evaluation_hooks=None, prediction_hooks=None)
Ops and objects returned from a
model_fn
and passed toIPUEstimator
.This is very similar to
EstimatorSpec
, with the addition of two extra arguments:eval_metrics
andhost_call
. If neither of those arguments are needed, anEstimatorSpec
can be passed to theIPUEstimator
instead.eval_metrics
is a tuple of a (function
,tensors
), wheretensors
is either a list of `tf.Tensor`s or a dict from strings to `tf.Tensor`s, that is passed to the function. The function runs on the CPU and returns a dict of metrics. The tensors are transferred from the IPU to the CPU host and passed to the function.Exactly one of
eval_metrics
andeval_metric_ops
must be provided during evaluation. The major difference between the two is that while theeval_metric_ops
will execute directly on the IPU, theeval_metrics
will execute on the CPU host using the provided function. Example:def my_metrics_fn(features, labels): return { "accuracy": tf.metrics.accuracy(labels, features), "precision": tf.metrics.precision(labels, features), "recall": tf.metrics.recall(labels, features), } eval_metrics = (my_metrics_fn, [features, labels]) spec = IPUEstimatorSpec(mode, loss=loss, eval_metrics=eval_metrics)
host_call
is a tuple of a function and a list of tensors to pass to that function.host_call
only works for training and is executed on the CPU for every training step. The tensors are transferred from the IPU to the CPU host and passed to the function.This functionality can be used for e.g. doing all-reduce of the gradients and weight updates on the host during distributed training with the
IPUMultiWorkerStrategy
. Example:def my_host_fn(*host_gradients): # This will all-reduce the gradients and update the weights on the host. return optimizer.apply_gradients(zip(host_gradients, variables)) train_op = tf.identity(loss) grads_and_vars = optimizer.compute_gradients(loss, var_list=variables) gradients = [g for (g, _) in grads_and_vars] host_call = (my_host_fn, gradients) spec = IPUEstimatorSpec(mode=mode, loss=loss, train_op=train_op, host_call=host_call)
See full example: Distributed training.
The various hooks (
training_hooks, `evaluation_hooks
,prediction_hooks
) support instances oftf.estimator.SessionRunHook
. To log tensor values from within themodel_fn
, use theIPULoggingTensorHook
.For documentation of the remaining arguments, see
EstimatorSpec
.- static __new__(cls, mode, predictions=None, loss=None, train_op=None, eval_metric_ops=None, eval_metrics=None, host_call=None, training_hooks=None, evaluation_hooks=None, prediction_hooks=None)
Create new instance of IPUEstimatorSpec(mode, predictions, loss, train_op, eval_metric_ops, eval_metrics, host_call, training_hooks, evaluation_hooks, prediction_hooks)
14.10.2. IPUPipelineEstimator
- class tensorflow.python.ipu.ipu_pipeline_estimator.IPUPipelineEstimator(model_fn, model_dir=None, config=None, params=None, warm_start_from=None)
Estimator for pipelining on IPUs.
IPUPipelineEstimator
, likeIPUEstimator
, handles many of the details of running on IPUs, such as placement of operations and tensors, graph compilation and usage of data feeds. Additionally, it adds support for pipelined execution over multiple IPUs.The major API difference from the IPUEstimator is that the provided
model_fn
must return aIPUPipelineEstimatorSpec
that contains the information needed for pipelined execution.Data parallelism based on graph replication is supported. Each replica will consume
gradient_accumulation_count
batches from the dataset returned by theinput_fn
and accumulate the gradients, giving an effective batch size ofnum_replicas * gradient_accumulation_count * batch_size
. The optimizer in themodel_fn
should be wrapped in anCrossReplicaOptimizer
in order to average the gradients across the replicas.This can further be combined with distributed multi-worker training using the
IPUMultiWorkerStrategy
, giving a total effective batch size ofnum_workers * num_replicas * gradient_accumulation_count * batch_size
.Refer to the
pipelining_ops
documentation for more details about pipelining.Note: because the
model_fn
is compiled to run on the IPU, you must use thewarm_start_from
parameter for a warm start and not thetf.train.init_from_checkpoint
method.- Parameters
model_fn – The model function. Refer to https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/custom_estimators.md#write-a-model-function for details on how to write this function.
model_dir – Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into an estimator to continue training a previously saved model. If
PathLike
object, the path will be resolved. IfNone
, the model_dir inconfig
will be used if set. If both are set, they must be same. If both areNone
, a temporary directory will be used.config – A
RunConfig
object.params –
dict
of hyper parameters that will be passed intomodel_fn
. Keys are names of parameters, values are basic python types.warm_start_from – Optional string filepath to a checkpoint or SavedModel to warm start from, or a
tf.estimator.WarmStartSettings
object to fully configure warm-starting. If the string filepath is provided instead of atf.estimator.WarmStartSettings
, then all variables are warm started, and it is assumed that vocabularies andtf.Tensor
names are unchanged.
- eval_dir(name=None)
Shows the directory name where evaluation metrics are dumped.
- Parameters
name – Name of the evaluation if user needs to run multiple evaluations on different data sets, such as on training data vs test data. Metrics for different evaluations are saved in separate folders, and appear separately in tensorboard.
- Returns
A string which is the path of directory contains evaluation metrics.
- evaluate(input_fn, steps=None, hooks=None, checkpoint_path=None, name=None)
Evaluates the model given evaluation data
input_fn
.- Parameters
input_fn –
A function that constructs the input data for evaluation. The function should return a
tf.data.Dataset
object. The outputs of theDataset
object must be a tuple(features, labels)
wherefeatures
is atf.Tensor
or a dictionary of string feature name toTensor
labels
is aTensor
or a dictionary of string label name toTensor
Both
features
andlabels
are consumed bymodel_fn
.steps – Number of steps for which to evaluate model.
hooks – List of
tf.train.SessionRunHook
subclass instances. Used for callbacks inside the evaluation call.checkpoint_path – Path of a specific checkpoint to evaluate. If
None
, the latest checkpoint inmodel_dir
is used. If there are no checkpoints inmodel_dir
, evaluation is run with newly initializedVariables
instead of ones restored from checkpoint.name – Name of the evaluation if user needs to run multiple evaluations on different data sets, such as on training data vs test data. Metrics for different evaluations are saved in separate folders, and appear separately in tensorboard.
- Returns
A dict containing the evaluation metrics specified in
model_fn
keyed by name, as well as an entryglobal_step
which contains the value of the global step for which this evaluation was performed.
- experimental_export_all_saved_models(export_dir_base, input_receiver_fn_map, assets_extra=None, as_text=False, checkpoint_path=None)
Exports a
SavedModel
withtf.MetaGraphDefs
for each requested mode.For each mode passed in via the
input_receiver_fn_map
, this method builds a new graph by calling theinput_receiver_fn
to obtain feature and labelTensor`s. Next, this method calls the `Estimator
’smodel_fn
in the passed mode to generate the model graph based on those features and labels, and restores the given checkpoint (or, lacking that, the most recent checkpoint) into the graph. Only one of the modes is used for saving variables to theSavedModel
(order of preference:tf.estimator.ModeKeys.TRAIN
,tf.estimator.ModeKeys.EVAL
, thentf.estimator.ModeKeys.PREDICT
), such that up to threetf.MetaGraphDefs
are saved with a single set of variables in a singleSavedModel
directory.For the variables and
tf.MetaGraphDefs
, a timestamped export directory belowexport_dir_base
, and writes aSavedModel
into it containing thetf.MetaGraphDef
for the given mode and its associated signatures.For prediction, the exported
MetaGraphDef
will provide oneSignatureDef
for each element of theexport_outputs
dict returned from themodel_fn
, named using the same keys. One of these keys is alwaystf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
, indicating which signature will be served when a serving request does not specify one. For each signature, the outputs are provided by the correspondingtf.estimator.export.ExportOutput`s, and the inputs are always the input receivers provided by the `serving_input_receiver_fn
.For training and evaluation, the
train_op
is stored in an extra collection, and loss, metrics, and predictions are included in aSignatureDef
for the mode in question.Extra assets may be written into the
SavedModel
via theassets_extra
argument. This should be a dict, where each key gives a destination path (including the filename) relative to the assets.extra directory. The corresponding value gives the full path of the source file to be copied. For example, the simple case of copying a single file without renaming it is specified as{'my_asset_file.txt': '/path/to/my_asset_file.txt'}
.- Parameters
export_dir_base – A string containing a directory in which to create timestamped subdirectories containing exported `SavedModel`s.
input_receiver_fn_map – dict of
tf.estimator.ModeKeys
toinput_receiver_fn
mappings, where theinput_receiver_fn
is a function that takes no arguments and returns the appropriate subclass ofInputReceiver
.assets_extra – A dict specifying how to populate the assets.extra directory within the exported
SavedModel
, orNone
if no extra assets are needed.as_text – whether to write the
SavedModel
proto in text format.checkpoint_path – The checkpoint path to export. If
None
(the default), the most recent checkpoint found within the model directory is chosen.
- Returns
The string path to the exported directory.
- Raises
ValueError – if any
input_receiver_fn
isNone
, noexport_outputs
are provided, or no checkpoint can be found.
- export_saved_model(export_dir_base, serving_input_receiver_fn, assets_extra=None, as_text=False, checkpoint_path=None, experimental_mode='infer')
Exports inference graph as a
SavedModel
into the given dir.For a detailed guide, see [Using SavedModel with Estimators](https://tensorflow.org/guide/saved_model#using_savedmodel_with_estimators).
This method builds a new graph by first calling the
serving_input_receiver_fn
to obtain featureTensor`s, and then calling this `Estimator
’smodel_fn
to generate the model graph based on those features. It restores the given checkpoint (or, lacking that, the most recent checkpoint) into this graph in a fresh session. Finally it creates a timestamped export directory below the givenexport_dir_base
, and writes aSavedModel
into it containing a singletf.MetaGraphDef
saved from this session.The exported
MetaGraphDef
will provide oneSignatureDef
for each element of theexport_outputs
dict returned from themodel_fn
, named using the same keys. One of these keys is alwaystf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
, indicating which signature will be served when a serving request does not specify one. For each signature, the outputs are provided by the correspondingtf.estimator.export.ExportOutput`s, and the inputs are always the input receivers provided by the `serving_input_receiver_fn
.Extra assets may be written into the
SavedModel
via theassets_extra
argument. This should be a dict, where each key gives a destination path (including the filename) relative to the assets.extra directory. The corresponding value gives the full path of the source file to be copied. For example, the simple case of copying a single file without renaming it is specified as{'my_asset_file.txt': '/path/to/my_asset_file.txt'}
.The experimental_mode parameter can be used to export a single train/eval/predict graph as a
SavedModel
. Seeexperimental_export_all_saved_models
for full docs.- Parameters
export_dir_base – A string containing a directory in which to create timestamped subdirectories containing exported `SavedModel`s.
serving_input_receiver_fn – A function that takes no argument and returns a
tf.estimator.export.ServingInputReceiver
ortf.estimator.export.TensorServingInputReceiver
.assets_extra – A dict specifying how to populate the assets.extra directory within the exported
SavedModel
, orNone
if no extra assets are needed.as_text – whether to write the
SavedModel
proto in text format.checkpoint_path – The checkpoint path to export. If
None
(the default), the most recent checkpoint found within the model directory is chosen.experimental_mode –
tf.estimator.ModeKeys
value indicating with mode will be exported. Note that this feature is experimental.
- Returns
The string path to the exported directory.
- Raises
ValueError – if no
serving_input_receiver_fn
is provided, noexport_outputs –
- export_savedmodel(export_dir_base, serving_input_receiver_fn, assets_extra=None, as_text=False, checkpoint_path=None, strip_default_attrs=False)
Exports inference graph as a
SavedModel
into the given dir. (deprecated)Warning: THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: This function has been renamed, use
export_saved_model
instead.For a detailed guide, see [Using SavedModel with Estimators](https://tensorflow.org/guide/saved_model#using_savedmodel_with_estimators).
This method builds a new graph by first calling the
serving_input_receiver_fn
to obtain featureTensor`s, and then calling this `Estimator
’smodel_fn
to generate the model graph based on those features. It restores the given checkpoint (or, lacking that, the most recent checkpoint) into this graph in a fresh session. Finally it creates a timestamped export directory below the givenexport_dir_base
, and writes aSavedModel
into it containing a singletf.MetaGraphDef
saved from this session.The exported
MetaGraphDef
will provide oneSignatureDef
for each element of theexport_outputs
dict returned from themodel_fn
, named using the same keys. One of these keys is alwaystf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
, indicating which signature will be served when a serving request does not specify one. For each signature, the outputs are provided by the correspondingtf.estimator.export.ExportOutput`s, and the inputs are always the input receivers provided by the `serving_input_receiver_fn
.Extra assets may be written into the
SavedModel
via theassets_extra
argument. This should be a dict, where each key gives a destination path (including the filename) relative to the assets.extra directory. The corresponding value gives the full path of the source file to be copied. For example, the simple case of copying a single file without renaming it is specified as{'my_asset_file.txt': '/path/to/my_asset_file.txt'}
.- Parameters
export_dir_base – A string containing a directory in which to create timestamped subdirectories containing exported `SavedModel`s.
serving_input_receiver_fn – A function that takes no argument and returns a
tf.estimator.export.ServingInputReceiver
ortf.estimator.export.TensorServingInputReceiver
.assets_extra – A dict specifying how to populate the assets.extra directory within the exported
SavedModel
, orNone
if no extra assets are needed.as_text – whether to write the
SavedModel
proto in text format.checkpoint_path – The checkpoint path to export. If
None
(the default), the most recent checkpoint found within the model directory is chosen.strip_default_attrs – Boolean. If
True
, default-valued attributes will be removed from the `NodeDef`s. For a detailed guide, see [Stripping Default-Valued Attributes]( https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md#stripping-default-valued-attributes).
- Returns
The string path to the exported directory.
- Raises
ValueError – if no
serving_input_receiver_fn
is provided, noexport_outputs –
- get_variable_names()
Returns list of all variable names in this model.
- Returns
List of names.
- Raises
ValueError – If the
Estimator
has not produced a checkpoint yet.
- get_variable_value(name)
Returns value of the variable given by name.
- Parameters
name – string or a list of string, name of the tensor.
- Returns
Numpy array - value of the tensor.
- Raises
ValueError – If the
Estimator
has not produced a checkpoint yet.
- latest_checkpoint()
Finds the filename of the latest saved checkpoint file in
model_dir
.- Returns
The full path to the latest checkpoint or
None
if no checkpoint was found.
- property model_fn
Returns the
model_fn
which is bound toself.params
.- Returns
def model_fn(features, labels, mode, config)
- Return type
The
model_fn
with following signature
- predict(input_fn, predict_keys=None, hooks=None, checkpoint_path=None, yield_single_examples=True, num_predictions=None)
Yields predictions for given features.
- Parameters
input_fn –
A function that constructs the features. The function should return a
tf.data.Dataset
object. The outputs of theDataset
object should be one of the following:features: A
Tensor
or a dictionary of string feature name toTensor
. features are consumed bymodel_fn
.A tuple, in which case the first item is extracted as features.
predict_keys – list of
str
, name of the keys to predict. It is used if thetf.estimator.EstimatorSpec.predictions
is adict
. Ifpredict_keys
is used then rest of the predictions will be filtered from the dictionary. IfNone
, returns all.hooks – List of
tf.train.SessionRunHook
subclass instances. Used for callbacks inside the prediction call.checkpoint_path – Path of a specific checkpoint to predict. If
None
, the latest checkpoint inmodel_dir
is used. If there are no checkpoints inmodel_dir
, prediction is run with newly initializedVariables
instead of ones restored from checkpoint.yield_single_examples – If
False
, yields the whole batch as returned by themodel_fn
instead of decomposing the batch into individual elements. This is useful ifmodel_fn
returns some tensors whose first dimension is not equal to the batch size.num_predictions – If not
None
, the generator will raiseStopIteration
after yielding this number of predictions. This allows draining the generator by usinglist(predictions)
. IfNone
, the returned generator is infinite and will trigger a fatal error if you try to consume more predictions from it than what is actually generated, instead of raising theStopIteration
exception. This is caused by the current behaviour when requesting to run a loop on the IPU for more iterations than there are elements remaining in the dataset. In this case you cannot drain it by usinglist(predictions)
, you have to consume the expected number of elements yourself, e.g. using[next(predictions) for _ in range(num_predictions)]
.
- Yields
Evaluated values of
predictions
tensors.
- train(input_fn, hooks=None, steps=None, max_steps=None, saving_listeners=None)
Trains a model given training data
input_fn
.- Parameters
input_fn –
A function that provides input data for training as minibatches. The function should return a
tf.data.Dataset
object. The outputs of theDataset
object must be a tuple(features, labels)
wherefeatures
is atf.Tensor
or a dictionary of string feature name toTensor
labels
is aTensor
or a dictionary of string label name toTensor
Both
features
andlabels
are consumed bymodel_fn
.hooks – List of
tf.train.SessionRunHook
subclass instances. Used for callbacks inside the training loop.steps – Number of steps for which to train the model.
steps
works incrementally. If you call two timestrain(steps=10)
then training occurs in total 20 steps. If you don’t want to have incremental behavior please setmax_steps
instead. If set,max_steps
must beNone
.max_steps – Number of total steps for which to train model. If set,
steps
must beNone
. Two calls totrain(steps=100)
means 200 training iterations. On the other hand, two calls totrain(max_steps=100)
means that the second call will not do any iteration since first call did all 100 steps.saving_listeners – list of
CheckpointSaverListener
objects. Used for callbacks that run immediately before or after checkpoint savings.
- Returns
self
, for chaining.
- class tensorflow.python.ipu.ipu_pipeline_estimator.IPUPipelineEstimatorSpec(mode, computational_stages, gradient_accumulation_count=None, pipeline_depth=None, eval_metrics_fn=None, optimizer_function=None, device_mapping=None, pipeline_schedule=None, offload_weight_update_variables=None, inputs=None)
Ops and objects returned from a
model_fn
and passed toIPUPipelineEstimator
.- static __new__(cls, mode, computational_stages, gradient_accumulation_count=None, pipeline_depth=None, eval_metrics_fn=None, optimizer_function=None, device_mapping=None, pipeline_schedule=None, offload_weight_update_variables=None, inputs=None)
Creates a validated
IPUPipelineEstimatorSpec
instance. (deprecated arguments)Warning: SOME ARGUMENTS ARE DEPRECATED:
(pipeline_depth)
. They will be removed in a future version. Instructions for updating: pipeline_depth is deprecated, use gradient_accumulation_count insteadDepending on the value of
mode
, different arguments are required. NamelyFor
mode == ModeKeys.TRAIN
: theoptimizer_function
is required.For
mode == ModeKeys.EVAL
: theeval_metrics_fn
is required.
Refer to the
pipelining_ops
documentation for more details about pipelining.- Parameters
mode – A
ModeKeys
. Specifies if this is training, evaluation or prediction.computational_stages – a list of Python functions, where each function represents a computational pipeline stage. The function takes the outputs of the previous pipeline state as its inputs.
gradient_accumulation_count – the number of times each pipeline stage will be executed.
eval_metrics_fn – a Python function which takes the output of the last computational stage as parameters and returns a dict of evaluation metrics. The dict must contain a a loss tensor value with the key “loss”. This function will be called on the host.
optimizer_function – a Python function which takes the output of the last computational stage as parameters and returns an instance of
OptimizerFunctionOutput
in order to generate the back-propagation and weight-update parts of the model suitable for training.device_mapping – optional stage to IPU mapping override.
pipeline_schedule – the scheduling algorithm to use for pipeline lowering. Must be of type
PipelineSchedule
.offload_weight_update_variables – If True, any
tf.Variable
which is only used by the weight update of the pipeline (for example the accumulator variable when using thetf.MomentumOptimizer
), will be stored in the remote memory. During the weight update this variable will be streamed onto the device and then streamed back to the remote memory after it has been updated. Requires the machine to be configured with support forPoplar remote buffers
. Offloading variables into remote memory can reduce maximum memory liveness, but can also increase the computation time of the weight update. Note that this option has no effect for inference only pipelines.inputs – arguments passed to the first pipeline stage. Can be used to pass e.g. a learning rate tensor or the
tf.train.get_global_step()
tensor that cannot be accessed directly from within a pipeline stage function.
- Returns
A validated
IPUPipelineEstimatorSpec
object.- Raises
ValueError – If validation fails.
14.10.3. Run configs
- class tensorflow.python.ipu.ipu_run_config.IPURunConfig(iterations_per_loop=1, ipu_options=None, compile_summary=False, num_replicas=1, num_shards=1, autosharding=False, ordinal=0)
IPU related configuration required by
IPUEstimator
.- Parameters
iterations_per_loop – This is the number of iterations running on the IPU device before returning to the CPU host for each
Session.run
. This means that the global step is increasediterations_per_loop
times in oneSession.run
.ipu_options – An IpuOptions configuration protobuf which is populated prior to being passed into IPURunConfig. Note that if more than one device is being used then
ipu_options
needs to be populated with adevice_config
.compile_summary – Generate compilation summary
num_replicas – Number of replicated graphs (data parallelism)
num_shards – Number of IPU devices on which the graph is sharded (model parallelism)
autosharding – Use the IPU
automatic_sharding
to automatically shard the graph acrossnum_shards
devicesordinal – The IPU device ordinal to use. For instance
0
corresponds to/device:IPU:0
.
- class tensorflow.python.ipu.ipu_run_config.RunConfig(ipu_run_config=None, master=None, **kwargs)
RunConfig with IPU support.
- __init__(ipu_run_config=None, master=None, **kwargs)
Constructs a RunConfig with IPU support.
These are the arguments specific to the RunConfig for IPUs. All remaining keyword arguments are passed to the base class, which is documented below.
- Parameters
ipu_run_config –
IPURunConfig
object for IPU-specific configuration.master – a string. The address of the distributed master to use for training.
Constructs a RunConfig.
All distributed training related properties
cluster_spec
,is_chief
,master
,num_worker_replicas
,num_ps_replicas
,task_id
, andtask_type
are set based on theTF_CONFIG
environment variable, if the pertinent information is present. TheTF_CONFIG
environment variable is a JSON object with attributes:cluster
andtask
.cluster
is a JSON serialized version ofClusterSpec
’s Python dict fromserver_lib.py
, mapping task types (usually one of theTaskType
enums) to a list of task addresses.task
has two attributes:type
andindex
, wheretype
can be any of the task types incluster
. WhenTF_CONFIG
contains said information, the following properties are set on this class:cluster_spec
is parsed fromTF_CONFIG['cluster']
. Defaults to {}. If present, must have one and only one node in thechief
attribute ofcluster_spec
.task_type
is set toTF_CONFIG['task']['type']
. Must set ifcluster_spec
is present; must beworker
(the default value) ifcluster_spec
is not set.task_id
is set toTF_CONFIG['task']['index']
. Must set ifcluster_spec
is present; must be 0 (the default value) ifcluster_spec
is not set.master
is determined by looking uptask_type
andtask_id
in thecluster_spec
. Defaults to ‘’.num_ps_replicas
is set by counting the number of nodes listed in theps
attribute ofcluster_spec
. Defaults to 0.num_worker_replicas
is set by counting the number of nodes listed in theworker
andchief
attributes ofcluster_spec
. Defaults to 1.is_chief
is determined based ontask_type
andcluster
.
There is a special node with
task_type
asevaluator
, which is not part of the (training)cluster_spec
. It handles the distributed evaluation job.Example of non-chief node: ```
- cluster = {‘chief’: [‘host0:2222’],
‘ps’: [‘host1:2222’, ‘host2:2222’], ‘worker’: [‘host3:2222’, ‘host4:2222’, ‘host5:2222’]}
- os.environ[‘TF_CONFIG’] = json.dumps(
- {‘cluster’: cluster,
‘task’: {‘type’: ‘worker’, ‘index’: 1}})
config = RunConfig() assert config.master == ‘host4:2222’ assert config.task_id == 1 assert config.num_ps_replicas == 2 assert config.num_worker_replicas == 4 assert config.cluster_spec == server_lib.ClusterSpec(cluster) assert config.task_type == ‘worker’ assert not config.is_chief
- cluster = {‘chief’: [‘host0:2222’],
‘ps’: [‘host1:2222’, ‘host2:2222’], ‘worker’: [‘host3:2222’, ‘host4:2222’, ‘host5:2222’]}
- os.environ[‘TF_CONFIG’] = json.dumps(
- {‘cluster’: cluster,
‘task’: {‘type’: ‘chief’, ‘index’: 0}})
config = RunConfig() assert config.master == ‘host0:2222’ assert config.task_id == 0 assert config.num_ps_replicas == 2 assert config.num_worker_replicas == 4 assert config.cluster_spec == server_lib.ClusterSpec(cluster) assert config.task_type == ‘chief’ assert config.is_chief
Example of evaluator node (evaluator is not part of training cluster): ```
- cluster = {‘chief’: [‘host0:2222’],
‘ps’: [‘host1:2222’, ‘host2:2222’], ‘worker’: [‘host3:2222’, ‘host4:2222’, ‘host5:2222’]}
- os.environ[‘TF_CONFIG’] = json.dumps(
- {‘cluster’: cluster,
‘task’: {‘type’: ‘evaluator’, ‘index’: 0}})
config = RunConfig() assert config.master == ‘’ assert config.evaluator_master == ‘’ assert config.task_id == 0 assert config.num_ps_replicas == 0 assert config.num_worker_replicas == 0 assert config.cluster_spec == {} assert config.task_type == ‘evaluator’ assert not config.is_chief
N.B.: If
save_checkpoints_steps
orsave_checkpoints_secs
is set,keep_checkpoint_max
might need to be adjusted accordingly, especially in distributed training. For example, settingsave_checkpoints_secs
as 60 without adjustingkeep_checkpoint_max
(defaults to 5) leads to situation that checkpoint would be garbage collected after 5 minutes. In distributed training, the evaluation job starts asynchronously and might fail to load or find the checkpoint due to race condition.- Parameters
model_dir – directory where model parameters, graph, etc are saved. If
PathLike
object, the path will be resolved. IfNone
, will use a default value set by the Estimator.tf_random_seed – Random seed for TensorFlow initializers. Setting this value allows consistency between reruns.
save_summary_steps – Save summaries every this many steps.
save_checkpoints_steps – Save checkpoints every this many steps. Can not be specified with
save_checkpoints_secs
.save_checkpoints_secs – Save checkpoints every this many seconds. Can not be specified with
save_checkpoints_steps
. Defaults to 600 seconds if bothsave_checkpoints_steps
andsave_checkpoints_secs
are not set in constructor. If bothsave_checkpoints_steps
andsave_checkpoints_secs
areNone
, then checkpoints are disabled.session_config – a ConfigProto used to set session parameters, or
None
.keep_checkpoint_max – The maximum number of recent checkpoint files to keep. As new files are created, older files are deleted. If
None
or 0, all checkpoint files are kept. Defaults to 5 (that is, the 5 most recent checkpoint files are kept.)keep_checkpoint_every_n_hours – Number of hours between each checkpoint to be saved. The default value of 10,000 hours effectively disables the feature.
log_step_count_steps – The frequency, in number of global steps, that the global step and the loss will be logged during training. Also controls the frequency that the global steps / s will be logged (and written to summary) during training.
train_distribute – An optional instance of
tf.distribute.Strategy
. If specified, then Estimator will distribute the user’s model during training, according to the policy specified by that strategy. Settingexperimental_distribute.train_distribute
is preferred.device_fn – A callable invoked for every
Operation
that takes theOperation
and returns the device string. IfNone
, defaults to the device function returned bytf.train.replica_device_setter
with round-robin strategy.protocol – An optional argument which specifies the protocol used when starting server.
None
means default to grpc.eval_distribute – An optional instance of
tf.distribute.Strategy
. If specified, then Estimator will distribute the user’s model during evaluation, according to the policy specified by that strategy. Settingexperimental_distribute.eval_distribute
is preferred.experimental_distribute – An optional
tf.contrib.distribute.DistributeConfig
object specifying DistributionStrategy-related configuration. Thetrain_distribute
andeval_distribute
can be passed as parameters toRunConfig
or set inexperimental_distribute
but not both.experimental_max_worker_delay_secs – An optional integer specifying the maximum time a worker should wait before starting. By default, workers are started at staggered times, with each worker being delayed by up to 60 seconds. This is intended to reduce the risk of divergence, which can occur when many workers simultaneously update the weights of a randomly initialized model. Users who warm-start their models and train them for short durations (a few minutes or less) should consider reducing this default to improve training times.
session_creation_timeout_secs – Max time workers should wait for a session to become available (on initialization or when recovering a session) with MonitoredTrainingSession. Defaults to 7200 seconds, but users may want to set a lower value to detect problems with variable / session (re)-initialization more quickly.
- Raises
ValueError – If both
save_checkpoints_steps
andsave_checkpoints_secs
are set. –
14.10.4. Session run hooks
- class tensorflow.python.ipu.ipu_session_run_hooks.IPULoggingTensorHook(every_n_iter=None, every_n_secs=None, at_end=False, formatter=None, logging_mode=IPUOutfeedMode.LAST, feed_name='logging_hook', replication_factor=1)
Prints the given tensors every N local steps, every N seconds, or at end.
This is a version of
tf.estimator.LoggingTensorHook
that supports logging from inside a function compiled for the IPU. The implementation uses an IPU outfeed in order to send the tensors from the compiled function to the host.The tensors will be printed to the log, with
INFO
severity.- LoggingMode
alias of
tensorflow.python.ipu.ipu_outfeed_queue.IPUOutfeedMode
- __init__(every_n_iter=None, every_n_secs=None, at_end=False, formatter=None, logging_mode=IPUOutfeedMode.LAST, feed_name='logging_hook', replication_factor=1)
Initializes the hook.
- Parameters
every_n_iter –
int
, print the tensor values once every N steps.every_n_secs –
int
orfloat
, print the tensor values once every N seconds. Exactly one ofevery_n_iter
andevery_n_secs
should be provided (unlessat_end
is True).at_end –
bool
specifying whether to print the tensor values at the end of the run.formatter – function that takes a dict with tensor names and values and returns a string. If None, uses default formatting.
logging_mode –
IPULoggingTensorHook.LoggingMode
that determines the behaviour when enqueuing multiple tensor values between dequeues (e.g. print all of them or only the last one).feed_name –
string
. The name of the outfeed queue. Must be unique.replication_factor –
int
, the number of replicas from which logging is performed.
- after_run(run_context, run_values)
Called after each call to run().
The
run_values
argument contains results of requested ops/tensors bybefore_run()
.The
run_context
argument is the same one send tobefore_run
call.run_context.request_stop()
can be called to stop the iteration.If
session.run()
raises any exceptions thenafter_run()
is not called.- Parameters
run_context – A
SessionRunContext
object.run_values – A SessionRunValues object.
- begin()
Called once before using the session.
When called, the default graph is the one that will be launched in the session. The hook can modify the graph by adding new operations to it. After the
begin()
call the graph will be finalized and the other callbacks can not modify the graph anymore. Second call ofbegin()
on the same graph, should not change the graph.
- end(session)
Called at the end of session.
The
session
argument can be used in case the hook wants to run final ops, such as saving a last checkpoint.If
session.run()
raises exception other than OutOfRangeError or StopIteration thenend()
is not called. Note the difference betweenend()
andafter_run()
behavior whensession.run()
raises OutOfRangeError or StopIteration. In that caseend()
is called butafter_run()
is not called.- Parameters
session – A TensorFlow Session that will be soon closed.
- log(tensors)
Logs the given
tensors
.- Parameters
tensors – either a dict from string to
tf.Tensor
, a list/tuple oftf.Tensor
objects, or atf.Tensor
.- Returns
The logging operation. It might be necessary to add a control dependency on this operation, or include it in the training operation using
tf.group()
, to avoid it from being pruned from the graph.
14.11. Keras layers
Note
tensorflow.python.ipu.keras.layers.GRU
is an alias of
tensorflow.python.ipu.keras.layers.PopnnGRU
tensorflow.python.ipu.keras.layers.LSTM
is an alias of
tensorflow.python.ipu.keras.layers.PopnnLSTM
14.11.1. Keras layer specializations for the Graphcore IPU
- class tensorflow.python.ipu.keras.layers.Dropout(rate=0.5, noise_shape=None, seed=None, **kwargs)
Base class for implementing XLA and Popnn compatible Dropout layer.
- build(input_shape)
Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of
Layer
orModel
can override if they need a state-creation step in-between layer instantiation and layer call.This is typically used to create the weights of
Layer
subclasses.- Parameters
input_shape – Instance of
TensorShape
, or list of instances ofTensorShape
if the layer expects a list of inputs (one instance per input).
- call(inputs, training=None)
This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
- compute_output_shape(input_shape)
Computes the output shape of the layer.
If the layer has not been built, this method will call
build
on the layer. This assumes that the layer will later be used with inputs that match the input shape provided here.- Parameters
input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- Returns
An input shape tuple.
- get_config()
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by
Network
(one layer of abstraction above).- Returns
Python dictionary.
- class tensorflow.python.ipu.keras.layers.Embedding(input_dim, output_dim, embeddings_initializer='uniform', **kwargs)
This is designed to be a replacement for the typical use cases of the Keras Embedding layer.
- Parameters
input_dim – int > 0. Size of the vocabulary, i.e. maximum integer index + 1.
output_dim – int >= 0. Dimension of the dense embedding.
embeddings_initializer – Initializer for the
embeddings
matrix.
- Input shape:
2D tensor with shape:
(batch_size, input_length)
.- Output shape:
3D tensor with shape:
(batch_size, input_length, output_dim)
.
- build(input_shape)
Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of
Layer
orModel
can override if they need a state-creation step in-between layer instantiation and layer call.This is typically used to create the weights of
Layer
subclasses.- Parameters
input_shape – Instance of
TensorShape
, or list of instances ofTensorShape
if the layer expects a list of inputs (one instance per input).
- call(inputs, training=None)
Perform an embedding lookup.
- Parameters
inputs – An integer tensor of indices into the embedding variable.
- Returns
The entries of the embedding tensor corresponding to the ids tensor indices.
- compute_output_shape(input_shape)
Computes the output shape of the layer.
If the layer has not been built, this method will call
build
on the layer. This assumes that the layer will later be used with inputs that match the input shape provided here.- Parameters
input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- Returns
An input shape tuple.
- get_config()
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by
Network
(one layer of abstraction above).- Returns
Python dictionary.
- class tensorflow.python.ipu.keras.layers.GroupNorm(dtype=tf.float32, groups=2, channels_axis=- 1, center=True, scale=True, epsilon=0.001, beta_initializer=None, gamma_initializer=None, strided_channel_grouping=True, name=None)
Group normalization layer optimized for running on the IPU.
This layer is used like the standard Keras BatchNormalization layer. However, it has beta and gamma trainable parameters, but no statistics gathering.
Group normalization is described in this paper: https://arxiv.org/abs/1803.08494.
- Parameters
dtype – The data type for the trainable weights.
groups – The number of groups to use in the normalization.
channels_axis – Integer, the axis that should be normalized (typically the features axis).
center – If True, add offset of
beta
to normalized tensor. If False,beta
is ignored.scale – If True, multiply by
gamma
. If False,gamma
is not used.epsilon – Small float added to variance to avoid dividing by zero.
beta_initializer – Initializer for the beta weight.
gamma_initializer – Initializer for the gamma weight.
strided_channel_grouping – Selects whether to group the channels dimension for group normalisation with a stride between channels. This makes the PopLibs implementation more efficient but is unconventional. Among other things this will mean that using pre-trained weights would not be possible if not produced with this unconventional implementation.
name – Optional name for the layer.
- build(input_shape)
Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of
Layer
orModel
can override if they need a state-creation step in-between layer instantiation and layer call.This is typically used to create the weights of
Layer
subclasses.- Parameters
input_shape – Instance of
TensorShape
, or list of instances ofTensorShape
if the layer expects a list of inputs (one instance per input).
- call(inputs, training=None)
This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
- class tensorflow.python.ipu.keras.layers.InstanceNorm(dtype=tf.float32, channels_axis=- 1, center=True, scale=True, epsilon=0.001, beta_initializer=None, gamma_initializer=None, name=None)
Instance normalization layer optimized for use on the IPU.
This layer is used like the standard Keras BatchNormalization layer. However, it has beta and gamma trainable parameters, but no statistics gathering.
Instance normalization is described in this paper: https://arxiv.org/abs/1607.08022.
- Parameters
dtype – The data type for the trainable weights.
groups – The number of groups to use in the normalization.
channels_axis – Integer, the axis that should be normalized (typically the features axis).
center – If True, add offset of
beta
to normalized tensor. If False,beta
is ignored.scale – If True, multiply by
gamma
. If False,gamma
is not used.epsilon – Small float added to variance to avoid dividing by zero.
beta_initializer – Initializer for the beta weight.
gamma_initializer – Initializer for the gamma weight.
name – Optional name for the layer.
- build(input_shape)
Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of
Layer
orModel
can override if they need a state-creation step in-between layer instantiation and layer call.This is typically used to create the weights of
Layer
subclasses.- Parameters
input_shape – Instance of
TensorShape
, or list of instances ofTensorShape
if the layer expects a list of inputs (one instance per input).
- call(inputs, training=None)
This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
- class tensorflow.python.ipu.keras.layers.LayerNorm(dtype=tf.float32, channels_axis=- 1, center=True, scale=True, epsilon=0.001, beta_initializer=None, gamma_initializer=None, name=None)
Layer normalization layer optimized for use on the IPU.
This layer is used like the standard Keras BatchNormalization layer. However, it has beta and gamma trainable parameters, but no statistics gathering.
Layer normalization is described in this paper: https://arxiv.org/abs/1607.06450.
- Parameters
dtype – The data type for the trainable weights.
groups – The number of groups to use in the normalization.
channels_axis – Integer, the axis that should be normalized (typically the features axis).
center – If True, add offset of
beta
to normalized tensor. If False,beta
is ignored.scale – If True, multiply by
gamma
. If False,gamma
is not used.epsilon – Small float added to variance to avoid dividing by zero.
beta_initializer – Initializer for the beta weight.
gamma_initializer – Initializer for the gamma weight.
name – Optional name for the layer.
- build(input_shape)
Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of
Layer
orModel
can override if they need a state-creation step in-between layer instantiation and layer call.This is typically used to create the weights of
Layer
subclasses.- Parameters
input_shape – Instance of
TensorShape
, or list of instances ofTensorShape
if the layer expects a list of inputs (one instance per input).
- call(inputs, training=None)
This is where the layer’s logic lives.
- Parameters
inputs – Input tensor, or list/tuple of input tensors.
**kwargs – Additional keyword arguments.
- Returns
A tensor or list/tuple of tensors.
- class tensorflow.python.ipu.keras.layers.PopnnGRU(units, activation='tanh', recurrent_activation='sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, dropout_seed=None, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, reset_after=False, seed=None, partials_dtype=tf.float32, time_major=False, **kwargs)
Popnn implementation of the Gated Recurrent Unit - Cho et al. 2014, optimized for the IPU.
There are two variants of the GRU implementation. The default one is based on [v3](https://arxiv.org/abs/1406.1078v3) and has reset gate applied to hidden state before matrix multiplication. The other one is based on [original](https://arxiv.org/abs/1406.1078v1) and has the order reversed. The first one is the default behaviour for this implementation, however the Keras equivalent can use the second variant. To use this variant, set
'reset_after'=True
(currently unsupported).Note that the Keras equivalent uses the
hard_sigmoid
as the default recurrent activation, however this version usessigmoid
as the default.- Parameters
units – Positive integer, dimensionality of the output space.
activation – Activation function to use. Default: hyperbolic tangent (
tanh
). If you passNone
, no activation is applied (ie. “linear” activation:a(x) = x
).recurrent_activation – Activation function to use for the recurrent step. Default: sigmoid (
sigmoid
). If you passNone
, no activation is applied (ie. “linear” activation:a(x) = x
).use_bias – Boolean, whether the layer uses a bias vector.
kernel_initializer – Initializer for the
kernel
weights matrix, used for the linear transformation of the inputs.recurrent_initializer – Initializer for the
recurrent_kernel
weights matrix, used for the linear transformation of the recurrent state.bias_initializer – Initializer for the bias vector.
kernel_regularizer – Unsupported - Regularizer function applied to the
kernel
weights matrix.recurrent_regularizer – Unsupported - Regularizer function applied to the
recurrent_kernel
weights matrix.bias_regularizer – Unsupported - Regularizer function applied to the bias vector.
activity_regularizer – Unsupported - Regularizer function applied to the output of the layer (its “activation”)..
kernel_constraint – Unsupported - Constraint function applied to the
kernel
weights matrix.recurrent_constraint – Unsupported - Constraint function applied to the
recurrent_kernel
weights matrix.bias_constraint – Unsupported - Constraint function applied to the bias vector.
dropout – Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
dropout_seed – An optional two-element tensor-like object (
tf.Tensor
, a numpy array or Python list/tuple), representing the random seed that will be used to create the distribution for dropout.recurrent_dropout – Unsupported - Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
implementation – Unsupported - Implementation mode.
return_sequences – Boolean. Whether to return the last output in the output sequence, or the full sequence.
return_state – Boolean. Whether to return the last state in addition to the output.
go_backwards – Unsupported - Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
stateful – Unsupported - Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
unroll – Unsupported - Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
time_major – The shape format of the
inputs
andoutputs
tensors. If True, the inputs and outputs will be in shape(timesteps, batch, ...)
, whereas in the False case, it will be(batch, timesteps, ...)
. Usingtime_major = True
is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.seed – A Python integer. Used for the
kernel_initializer
andrecurrent_initializer
.partials_dtype – the type used by Popnn to perform partial calculations. Either tf.float16 or tf.float32.
reset_after – Unsupported - GRU convention (whether to apply reset gate after or before matrix multiplication). False = “before” (default), True = “after”.
- Call arguments:
inputs: A 3D tensor. training: Python boolean indicating whether the layer should behave in
training mode or in inference mode. This argument is passed to the cell when calling it. This is only relevant if
dropout
orrecurrent_dropout
is used.- initial_state: List of initial state tensors to be passed to the first
call of the cell.
- build(input_shape)
Create variables of the PopnnGRU.
It can be called manually before
__call__()
or automatically through__call__()
. In the former case, any subsequent__call__()
will skip creating variables.- Parameters
input_shape – a TensorShape object with 3 dimensions.
- Raises
ValueError – if input_shape has wrong dimension or unknown 3rd dimension.
- call(inputs, training=None, initial_state=None)
Runs the forward step for the GRU layer.
- Parameters
inputs – 3-D tensor with shape [batch_size, seq_len, input_size]. If the time_major parameter is True, the the shape should be [seq_len, batch_size, input_size].
initial_state – Initial state tensor, shaped
[batch_size, num_units]
If not provided, the state is initialized to zeros.training – whether this operation will be used in training or inference.
- Returns
- When
return_sequences
is set, then GRU returns a tensor of shape [batch_size, seq_len, num_units], otherwise it returns a tensor of shape [batch_size, num_units].
- output_state: The output state of the last cell, when the parameter
return_state
is set to True.
- When
- Return type
output
- Raises
ValueError – if initial_state is not valid.
- state_shape(batch_size)
Shape of Popnn GRU state.
State shape is [batch_size, num_units].
- Parameters
batch_size – an int
- Returns
A Python array.
- class tensorflow.python.ipu.keras.layers.PopnnLSTM(units, activation='tanh', recurrent_activation='sigmoid', use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, bias_constraint=None, dropout=0.0, dropout_seed=None, recurrent_dropout=0.0, implementation=1, return_sequences=False, return_state=False, go_backwards=False, stateful=False, unroll=False, partials_dtype=tf.float32, seed=None, time_major=False, **kwargs)
Popnn implementation of Long Short-Term Memory layer - Hochreiter 1997, optimized for the IPU.
Note that the Keras equivalent uses the
hard_sigmoid
as the default recurrent activation, however this version usessigmoid
as the default.- Parameters
units – Positive integer, dimensionality of the output space.
activation – Activation function to use. Default: hyperbolic tangent (
tanh
). If you passNone
, no activation is applied (ie. “linear” activation:a(x) = x
).recurrent_activation – Activation function to use for the recurrent step. Default: sigmoid (
sigmoid
). If you passNone
, no activation is applied (ie. “linear” activation:a(x) = x
).use_bias – Boolean, whether the layer uses a bias vector.
kernel_initializer – Initializer for the
kernel
weights matrix, used for the linear transformation of the inputs..recurrent_initializer – Initializer for the
recurrent_kernel
weights matrix, used for the linear transformation of the recurrent state.bias_initializer – Initializer for the bias vector.
unit_forget_bias –
Boolean. If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force
bias_initializer="zeros"
. This is recommended in [Jozefowicz etkernel_regularizer – Unsupported - Regularizer function applied to the
kernel
weights matrix.recurrent_regularizer – Unsupported - Regularizer function applied to the
recurrent_kernel
weights matrix.bias_regularizer – Unsupported - Regularizer function applied to the bias vector.
activity_regularizer – Unsupported - Regularizer function applied to the output of the layer (its “activation”)..
kernel_constraint – Unsupported - Constraint function applied to the
kernel
weights matrix.recurrent_constraint – Unsupported - Constraint function applied to the
recurrent_kernel
weights matrix.bias_constraint – Unsupported - Constraint function applied to the bias vector.
dropout – Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
dropout_seed – An optional two-element tensor-like object (
tf.Tensor
, a numpy array or Python list/tuple), representing the random seed that will be used to create the distribution for dropout.recurrent_dropout – Unsupported - Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
implementation – Unsupported - Implementation mode.
return_sequences – Boolean. Whether to return the last output. in the output sequence, or the full sequence.
return_state – Boolean. Whether to return the last state in addition to the output.
go_backwards – Unsupported - Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
stateful – Unsupported - Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
unroll – Unsupported - Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
seed – A Python integer. Used for the
kernel_initializer
andrecurrent_initializer
.partials_dtype – the type used by Popnn to perform partial calculations. Either tf.float16 or tf.float32.
time_major – The shape format of the
inputs
andoutputs
tensors. If True, the inputs and outputs will be in shape(timesteps, batch, ...)
, whereas in the False case, it will be(batch, timesteps, ...)
. Usingtime_major = True
is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.
- Call arguments:
inputs: A 3D tensor. training: Python boolean indicating whether the layer should behave in
training mode or in inference mode. This argument is passed to the cell when calling it. This is only relevant if
dropout
orrecurrent_dropout
is used.- initial_state: List of initial state tensors to be passed to the first
call of the cell.
- build(input_shape)
Create variables of the PopnnLSTM.
It can be called manually before
__call__()
or automatically through__call__()
. In the former case, any subsequent__call__()
will skip creating variables.- Parameters
input_shape – a TensorShape object with 3 dimensions.
- Raises
ValueError – if input_shape has wrong dimension or unknown 3rd dimension.
- call(inputs, training=None, initial_state=None)
Runs the forward step for the LSTM layer.
- Parameters
inputs – 3-D tensor with shape [batch_size, seq_len, input_size]. If the time_major parameter is set to True, then the shape should be [seq_len, batch_size, input_size].
initial_state – An
LSTMStateTuple
of state tensors, each shaped[batch_size, num_units]
. If not provided, the state is initialized to zeros.training – whether this operation will be used in training or inference.
- Returns
- When
return_sequences
is set, then LSTM returns a tensor of shape [batch_size, seq_len, num_units], otherwise it returns a tensor of shape [batch_size, num_units].
- output_state: The output state of the last cell, when the parameter
return_state
is set to True.
- When
- Return type
output
- state_shape(batch_size)
Shape of Popnn LSTM states.
Shape is a 2-element tuple. Each is [batch_size, num_units]
- Parameters
batch_size – an int
- Returns
a tuple of Python arrays.
14.12. Operators
It is also possible to access the operators via the
tensorflow.python.ipu.ops
namespace, for example:
tensorflow.python.ipu.ops.normalization_ops.group_norm()
.
14.12.1. Custom operations
- tensorflow.python.ipu.custom_ops.codelet_expression_op(vertex_expression, *args)
Add a custom fused elementwise expression operation to the graph.
Note that no autograd is done on this fused operation because the autograd code does not understand the internal structure of the fused codelet.
- Parameters
vertex_expression – A Python function that defines the codelet expression.
args – Tensor inputs to the expression.
- Returns
The Tensor which is a result of applying the elementwise operation
- tensorflow.python.ipu.custom_ops.cpu_user_operation(inputs, library_path, outs=None, name='UserOp', op_name='Callback', separate_gradients=False, inputs_with_gradients=None, attributes=None, gradient_attributes=None)
Call the CPU function located in the shared library at
library_path
as part of the normal TensorFlow execution with the giveninputs
copied from the IPU to the CPU, and the outputs are copied back to the IPU afterwards.The shape and type of the outputs should be specified by
outs
. If it isNone
it will default to no output.outs
should be a dictionary with two elements like so:outs = { "output_types": [my_types_as_a_list], "output_shapes": [my_shapes_as_a_list], }
- Parameters
inputs – The tensor inputs to the operation.
library_path – The path to the shared object that contains the functions to execute the operation.
outs – A dictionary describing the output tensor shapes and types.
name – The name of the operation.
op_name – The prefix of the functions inside the shard object file. This defaults to ‘Callback’.
separate_gradients – When set to
True
, multiple gradient ops will be generated, one for each input. WhenFalse
, a single gradient op will be generated, which should produce the partial derivatives for all inputs.inputs_with_gradients – When set, produce derivatives only for specified inputs. List of input indices expected.
attributes – An optional string object which is passed as an argument to the Poplar function. Allows to specify function attributes which were not known at the compile time of the C++ Poplar function. Can be used to pass a JSON or ProtoBuf serialized string to the Poplar function for an ease of use. See the documention for examples.
gradient_attributes – Same as
attribute
, however this is passed as theattribute
to the gradient operations (if training.)
- Returns
The array of tensor outputs.
- tensorflow.python.ipu.custom_ops.precompiled_user_op(inputs, library_path, gp_path='', outs=None, name='UserOp', op_name='Build', separate_gradients=False, inputs_with_gradients=None, attributes=None, gradient_attributes=None)
Call the Poplar function located in the shared library at
library_path
as part of the normal TensorFlow execution with the giveninputs
.The shape and type of the output should be specified by
outs
. If it isNone
it will default to no output.outs
should be a dictionary with two elements like this:outs = { "output_types": [my_types_as_a_list], "output_shapes": [my_shapes_as_a_list], }
- Parameters
inputs – The tensor inputs to the operation.
library_path – The path to the shared object that contains the functions to build the Poplar operation in the graph.
gp_path – The path to the precompiled codelet file.
outs – A dictionary describing the output tensor shapes and types.
name – The name of the operation.
op_name – The prefix of the functions inside the shard object file. This defaults to ‘Build’.
separate_gradients – When set to true, multiple gradient ops will be generated, one for each input. When false, a single gradient op will be generated, which should produce the partial derivatives for all inputs.
inputs_with_gradients – When set, produce derivatives only for specified inputs. List of input indices expected.
attributes – An optional string object which is passed as an argument to the Poplar function. Allows to specify function attributes which were not known at the compile time of the C++ Poplar function. Can be used to pass a JSON or ProtoBuf serialized string to the Poplar function for an ease of use. See the documention for examples.
gradient_attributes – Same as
attribute
, however this is passed as theattribute
to the gradient operations (if training.)
- Returns
The array of tensor outputs.
14.12.2. Functional operators
- tensorflow.python.ipu.functional_ops.function(func, name=None)
A function is a block of organized, reusable code which is used to perform a single action. Functions provide better modularity for your application and a high degree of code reusing which can decrease the memory usage at the expense of passing the arguments around.
Functions can be used by models constrained by memory which have common structures or to serialize some large operations.
If the provided function contains any stateful operations, such as stateful random number generation, then the function cannot be reused and it will be inlined automatically.
See the documentation for more details and examples.
- Parameters
func – A python function which takes a list of positional arguments only. All the arguments must be
tf.Tensor
-like objects, or be convertible to them. See the documentation for examples of how to pass nontf.Tensor
-like objects to the functions. The function provided must return at least onetf.Tensor
-like object.name – The name of the function.
- Returns
An
Operation
that executes the function.
14.12.3. Graphcore utility operations
- tensorflow.python.ipu.internal_ops.print_tensor(input, name='')
Print the specified input.
- Parameters
input – The tensor to print.
name – Optional op name.
- Returns
An operator that prints the specified input to the standard error. For the tensor to be printed one must either return it as part of their XLA function which is consumed by ipu_compiler.compile, or include the returned op in the input to session.run, or use the operator as a control dependency for executed ops by specifying with tf.control_dependencies([print_op]).
Examples
Returning the print operation as part of the XLA function:
import tensorflow as tf from tensorflow.python.ipu import internal_ops from tensorflow.python.ipu import scopes def my_net(v): print_op = internal_ops.print_tensor(v) v = v + 1 return v, print_op with scopes.ipu_scope("/device:IPU:0"): res = ipu_compiler.compile(my_net, inputs=[v]) ... ...
Including the print operation in session.run:
import numpy as np import tensorflow as tf from tensorflow.python.ipu import internal_ops from tensorflow.python.ipu import scopes with scopes.ipu_scope("/device:IPU:0"): pa = tf.placeholder(np.float32, [2, 2], name="a") print_op = internal_ops.print_tensor(pa) x = pa + 1 with tf.Session() as session: result = session.run([x, print_op], feed_dict={pa : np.ones([2, 2])}) ... ...
Using control dependencies:
import numpy as np import tensorflow as tf from tensorflow.python.ipu import internal_ops from tensorflow.python.ipu import scopes with scopes.ipu_scope("/device:IPU:0"): pa = tf.placeholder(np.float32, [2, 2], name="a") print_op = internal_ops.print_tensor(pa) with tf.control_dependencies([print_op]): x = pa + 1 with tf.Session() as session: result = session.run(x, feed_dict={pa : np.ones([2, 2])}) ... ...
14.12.4. IPU specific maths operations
- tensorflow.python.ipu.math_ops.serialized_matmul(a, b, serialization_factor, serialization_dimension, transpose_a=False, transpose_b=False, name=None)
Multiplies matrix a by matrix b, producing a * b, with the multiplication being serialized on one of the dimensions.
Serializing a matrix multiplication operation can reduce the code size of the multiplication at the expense of extra computation due to copying of tensors.
The inputs must, following any transpositions, be tensors of rank >= 2 where the inner 2 dimensions specify valid matrix multiplication dimensions, and any further outer dimensions specify matching batch size.
Either matrix can be transposed on the fly by setting one of the corresponding flag to True. These are False by default.
Given the tensor
a
with shape[..., m, k]
and tensorb
with shape […, k, n] after the transpositions, the matrix multiplication can be serialized as follows:Along the columns dimension of
a
(them
-dimension), by settingserialization_dimension
toa_columns
.Along the rows dimension of
a
and the columns dimension ofb
(thek
-dimension), by settingserialization_dimension
toa_rows_b_columns
.Along the rows dimension of
b
(them
-dimension), by settingserialization_dimension
tob_rows
.
Note that taking a gradient of a serialized matrix multiplication means that the backward propagation of the matrix multiply will also be serialized.
Note that adjoining and sparse matrices are not supported.
- Parameters
a –
tf.Tensor
of type float16, float32, int32 and rank >= 2.b –
tf.Tensor
with same type and rank as a.serialization_factor – An integer indicating the number of smaller matrix multiplies this operation is broken up into. Must divide the dimension along which the operation is serialized on.
serialization_dimension – A string, must be one of
a_columns
,a_rows_b_columns
orb_rows
. Indicates the dimension along which the operation is serialzed on.transpose_a – If True, a is transposed before multiplication.
transpose_b – If True, b is transposed before multiplication.
name – Name for the operation (optional).
- Returns
A
tf.Tensor
of the same type as a and b where each inner-most matrix is the product of the corresponding matrices in a and b, e.g. if all transpose attributes are False:output[…, i, j] = sum_k (a[…, i, k] * b[…, k, j]), for all indices i, j.
14.12.5. Pipelining operators
- class tensorflow.python.ipu.pipelining_ops.IntEnum(value)
Enum where members are also (and must be) ints
- class tensorflow.python.ipu.pipelining_ops.OptimizerFunctionOutput(opt, loss)
A helper class used for returning a structured output from an optimizer_function in a pipeline.
- __init__(opt, loss)
Creates an OptimizerFunctionOutput object.
- Parameters
opt – An instance of
optimizer.Optimizer
which is used to generate the back-propagation and the weight update pipeline stages.loss – The loss which is passed to the optimizer.
- class tensorflow.python.ipu.pipelining_ops.PipelineSchedule(value)
The PipelineSchedule describes how stages are interleaved on the IPUs servicing the pipeline. The forward and backward passes of each stage will execute on the same IPUs. So, in the core of the pipeline there is a choice as to whether to run the forward stages together, or the backward stages and the forward stages together.
- Grouped
This groups the forward passes on multiple IPUs. This requires more memory since activations need to be stored until the backward stages run together. However, since forward passes tend to be smaller than backward passes, Grouped tends to improve the speed of the execution, as different IPUs don’t spend so much time waiting for each other.
- Interleaved
This schedules the backward passes whenever the forward passes have just generated some activations. Consequently fewer activations are required to be stored between the forward and backward pipeline stages, so less memory is required. However, since forward and backward stages tend to be very different in terms of execution cycles, the overall performance of the pipeline tends to be slower.
- Sequential
This is a debug mode, where the pipeline is scheduled in the same way as if it were a sharded model.
- class tensorflow.python.ipu.pipelining_ops.PipelineStageOptions(convolution_options=None, matmul_options=None)
A helper class which can be used to configure Poplar compilation options (such as ‘availableMemoryProportion’) inside a pipeline forward, backward and weight update stage. This will override the global options set by
ipu.utils.set_convolution_options
andipu.utils.set_matmul_options
.- __init__(convolution_options=None, matmul_options=None)
Creates an PipelineStageOptions object.
- Parameters
convolution_options – If provided, a dictionary of Poplar option flags for all the convolution operations in the stage.
matmul_options – If provided, a dictionary of Poplar option flags for all the matmul operations in the stage.
loss – The loss which is passed to the optimizer.
- tensorflow.python.ipu.pipelining_ops.deprecated_args(date, instructions, *deprecated_arg_names_or_tuples, **kwargs)
Decorator for marking specific function arguments as deprecated.
This decorator logs a deprecation warning whenever the decorated function is called with the deprecated argument. It has the following format:
Calling <function> (from <module>) with <arg> is deprecated and will be removed after <date>. Instructions for updating:
<instructions>
If
date
is None, ‘after <date>’ is replaced with ‘in a future version’. <function> includes the class name if it is a method.It also edits the docstring of the function: ‘ (deprecated arguments)’ is appended to the first line of the docstring and a deprecation notice is prepended to the rest of the docstring.
- Parameters
date – String or None. The date the function is scheduled to be removed. Must be ISO 8601 (YYYY-MM-DD), or None.
instructions – String. Instructions on how to update code using the deprecated function.
*deprecated_arg_names_or_tuples – String or 2-Tuple(String, [ok_vals]). The string is the deprecated argument name. Optionally, an ok-value may be provided. If the user provided argument equals this value, the warning is suppressed.
**kwargs – If
warn_once=False
is passed, every call with a deprecated argument will log a warning. The default behavior is to only warn the first time the function is called with any given deprecated argument. All other kwargs raiseValueError
.
- Returns
Decorated function or method.
- Raises
ValueError – If date is not None or in ISO 8601 format, instructions are empty, the deprecated arguments are not present in the function signature, the second element of a deprecated_tuple is not a list, or if a kwarg other than
warn_once
is passed.
- tensorflow.python.ipu.pipelining_ops.pipeline(computational_stages, pipeline_depth=None, gradient_accumulation_count=None, repeat_count=1, batch_serialization_iterations=1, inputs=None, infeed_queue=None, outfeed_queue=None, optimizer_function=None, device_mapping=None, pipeline_schedule=None, forward_propagation_stages_poplar_options=None, backward_propagation_stages_poplar_options=None, weight_update_poplar_options=None, offload_weight_update_variables=None, replicated_optimizer_state_sharding=False, offload_activations=None, offload_gradient_accumulation_buffers=None, replicated_weight_sharding=None, offload_weights=None, continuous_weight_updates=False, outfeed_loss=False, name=None)
Sets up a series of computational stages, where the outputs of one stage are (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(pipeline_depth)
. They will be removed in a future version. Instructions for updating: pipeline_depth is deprecated, use gradient_accumulation_count insteadthe inputs to the next one. These stages are then executed in parallel across multiple IPUs. This approach can be used to split the model where layer(s) are executed on different IPUs.
The first stage takes the
inputs
and theinfeed_queue
(if provided) as its inputs. If theinfeed_queue
is provided, it is automatically dequeued (similar to the ipu.loops API) therefore care needs to be taken to make sure the signature of the first pipeline stage matches both the arguments frominputs
and theinfeed_queue
, otherwise an error is thrown.All tensors which are used in the pipeline which are not TensorFlow Variables need to be explicitly passed as inputs to the pipeline. If an input does not change its value during the execution of the pipeline op (for example hyperparameters such as learning rate), it needs to be passed as part of
inputs
. Alternatively, if these values change during execution (for example the model processes different batches of data) the input should be passed through theinfeed_queue
(seeIPUInfeedQueue
).When training a model, an optional
optimizer_function
function can be provided. This function takes all the outputs from the last computational stage as inputs, and returns an instance ofOptimizerFunctionOutput
that is used to generate the backwards pass of the model using the TensorFlow Optimizer API. This will internally create corresponding backpropagation pipeline stages for each pipeline stage and colocate them such that the activations and weights required for the gradient calculation and application stay on the device in order to minimise the number of copies between IPUs.Note that the gradients, which are calculated by the
compute_gradients
function, will be accumulated automatically during the execution of the pipeline, unlesscontinuous_weight_updates
is enabled.If the last computational stage has any outputs, then an
outfeed_queue
(seeIPUOutfeedQueue
) is required and all the outputs from the last computational stage are enqueued to theoutfeed_queue
.Note that pipelining also supports recomputation, to enable it, use the
tensorflow.ipu.utils.set_recomputation_options()
function when configuring the device.For example a simple inference network for the MNIST can be split across two IPUs:
from tensorflow import keras # Create the dataset #... # Create the data queues from/to IPU. infeed_queue = ipu_infeed_queue.IPUInfeedQueue(dataset, "infeed") outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue("outfeed") # Create a pipelined model which is split accross two stages. def stage1(image): partial = keras.layers.Dense(256, activation=tf.nn.relu)(image) partial = keras.layers.Dense(128, activation=tf.nn.relu)(partial) return partial def stage2(partial): logits = keras.layers.Dense(10)(partial) probabilities = tf.nn.softmax(logits) classes = tf.argmax(input=logits, axis=1) return probabilities, classes def model(): with variable_scope.variable_scope("vs", use_resource=True): pipeline_op = pipelining_ops.pipeline( computational_stages=[stage1, stage2], gradient_accumulation_count=250, repeat_count=2, inputs=[], infeed_queue=infeed_queue, outfeed_queue=outfeed_queue, device_mapping=[3,1], name="Pipeline") return pipeline_op with ops.device("/device:IPU:0"): compiled_model = ipu_compiler.compile(model, inputs=[]) outfeed_op = outfeed_queue.dequeue() with tf.Session() as sess: result = sess.run(compiled_model) probabilities, classes = sess.run(outfeed_op)
In this set up, the model is split across two IPUs. By default the first two layers would be executed on the first IPU and the third layer and the probabilities and classes on the second IPU but here
device_mapping
is used to override the default IPU allocation and instead the first two layers will be executed on the fourth IPU and the third layer and the probabilities and classed on the second IPU.This creates a pipeline of depth 250 (specified by the
gradient_accumulation_count
), which means each pipeline stage is executed 250 times.This pipeline is then executed 2 times (specified by the
repeat_count
) The results of the pipeline (probabilities and classes) are returned to the host by the outfeed queue.We can also train this network by providing
optimizer_function
:from tensorflow import keras # Create the dataset #... # Create the data queues from/to IPU. infeed_queue = ipu_infeed_queue.IPUInfeedQueue(dataset, "infeed") outfeed_queue = ipu_outfeed_queue.IPUOutfeedQueue("outfeed") # Create a pipelined model which is split accross two stages. def stage1(lr, images, labels): partial = keras.layers.Dense(256, activation=tf.nn.relu)(images) partial = keras.layers.Dense(128, activation=tf.nn.relu)(partial) return lr, partial, labels def stage2(lr, partial, labels): logits = keras.layers.Dense(10)(partial) cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( labels=labels, logits=logits) loss = tf.reduce_mean(cross_entropy) return lr, loss def optimizer_function(lr, loss): optimizer = tf.train.GradientDescentOptimizer(lr) return pipelining_ops.OptimizerFunctionOutput(optimizer, loss) def model(lr): with variable_scope.variable_scope("vs", use_resource=True): pipeline_op = pipelining_ops.pipeline( computational_stages=[stage1, stage2], gradient_accumulation_count=128, repeat_count=10, inputs=[lr], infeed_queue=infeed_queue, outfeed_queue=outfeed_queue, optimizer_function=optimizer_function, name="Pipeline") return pipeline_op with ops.device('cpu'): lr = tf.placeholder(np.float16, []) with ops.device("/device:IPU:0"): compiled_model = ipu_compiler.compile(model, inputs=[lr]) outfeed_op = outfeed_queue.dequeue() with tf.Session() as sess: result = sess.run(compiled_model, {lr: 0.01}) losses = sess.run(outfeed_op)
Here the
tf.train.GradientDescentOptimizer
generates the pipeline stages which calculate the gradients and apply them to the weights. Note how the loss is returned to the host by the outfeed queue.If a model requires multiple computational pipeline stages to access the same
tf.Variable
, then all of these computational stages need to be placed on the same IPU using thedevice_mapping
argument.Note that modifying
tf.Variable
values in a pipeline stage and/or during the gradient calculation will result in undefined behavior. These variables can only be modified by theapply_gradients
member function of the applied Optimizer.- Parameters
computational_stages – a list of python functions, where each function represents a computational pipeline stage. The function takes the outputs of the previous pipeline state as its inputs.
gradient_accumulation_count – the number of times each pipeline stage will be executed.
repeat_count – the number of times the pipeline will be executed.
batch_serialization_iterations – number of times a loop executes to compute a batch on each pipeline stage execution. Currently only supported with the
PipelineSchedule.Sequential
.inputs – arguments passed to the first pipeline stage.
infeed_queue – optional IPUInfeedQueue, if passed, it is dequeued and passed as an input in the first pipeline stage.
outfeed_queue – IPUOutfeedQueue, required if the last computational stage has any outputs. The outputs of these are enqueued to this queue and they can be accessed on the host.
optimizer_function – optional Python function which takes the output of the last computational stage as parameters and returns an instance of
pipelining_ops.OptimizerFunctionOutput
in order to generate the back-propagation and weight-update parts of the model suitable for training.device_mapping – If provided, a list of length equal to the number of computational stages. An element at index
i
in the list represents which IPU the computational stagecomputational_stages[i]
should reside on. This can be used to make sure computational stages which share `tf.Variable`s are resident on the same IPU.pipeline_schedule – Which scheduling algorithm to use for pipeline lowering. Defaults to
PipelineSchedule.Grouped
.forward_propagation_stages_poplar_options – If provided, a list of length equal to the number of computational stages. Each element is a PipelineStageOptions object which allows for fine grain control of the Poplar options for a given forward propagation computational stage.
backward_propagation_stages_poplar_options – If provided, a list of length equal to the number of computational stages. Each element is a PipelineStageOptions object which allows for fine grained control of the Poplar options for a given backward propagation computational stage.
weight_update_poplar_options – If provided, a PipelineStageOptions object which allows for fine grained control of the Poplar options for the weight update stage.
offload_weight_update_variables – When enabled, any
tf.Variable
which is only used by the weight update of the pipeline (for example the accumulator variable when using thetf.MomentumOptimizer
), will be stored in the remote memory. During the weight update this variable will be streamed onto the device and then streamed back to the remote memory after it has been updated. Requires the machine to be configured with support forPoplar remote buffers
. Offloading variables into remote memory can reduce maximum memory liveness, but can also increase the computation time of the weight update. When set toNone
the variables will be placed in either in-processor or remote memory automatically based on the current best placement strategy. Note that this option has no effect for inference only pipelines.replicated_optimizer_state_sharding – If True, any
tf.Variable
which is offloaded (for example the accumulator variable when using thetf.MomentumOptimizer
), will be partitioned across the replicas. This can exploit the additional bandwidth of the IPU-Links to improve overall throughput. Note that this option has no effect for inference only pipelines.offload_activations – When enabled, all the activations for the batches which are not being executed by the pipeline stages at the given time are stored in remote memory. Requires the machine to be configured with support for
Poplar remote buffers
. Offloading activations into remote memory can reduce maximum memory liveness, but can also increase the computation time as activations have to be copied from/to the device(s). When set toNone
, the activations might be offloaded when beneficial. This feature is currently only supported when the pipeline schedule isPipelineSchedule.Sequential
andbatch_serialization_iterations > 1
.offload_gradient_accumulation_buffers – When enabled, all the gradient accumulation buffers are stored in remote memory. Offloading gradient accumulation buffers into remote memory can reduce maximum memory liveness, but can also increase the computation time as the buffers have to be copied to the device, updated and the copied off the device. Requires the machine to be configured with support for
Poplar remote buffers
. When set toNone
, theoffload_gradient_accumulation_buffers
might be offloaded when beneficial. Note that this option has no effect for inference only pipelines.replicated_weight_sharding – When enabled and running a replicated model, any
tf.Variable`s used by the pipeline stage computations (excluding those only used by the weight update), will be partitioned across the replicas. Whenever the a partitioned `tf.Variable
is accessed, it will be first all-gathered across replicas to make sure each replica has access to the wholetf.Variable
. This can exploit the additional bandwidth of the IPU-Links to improve overall throughput. When set toNone
, the activations might be offloaded when beneficial. This feature is enabled by default when the pipeline schedule isPipelineSchedule.Sequential
andbatch_serialization_iterations > 1
, where this option can reduce the memory usage at the cost of extra communication.offload_weights – When enabled and
replicated_weight_sharding
is enabled, anytf.Variable
which are partitioned across replicas will be stored inPoplar remote buffers
. Offloading variables into remote memory can further reduce maximum memory liveness, but can also increase the computation time due to extra communication. When set toNone
the variables will be placed in either in-processor or remote memory automatically based on the current best placement strategy.continuous_weight_updates – ** CURRENTLY UNIMPLEMENTED ** When training, this option will apply the gradients to the resource variables immediately, rather than accumulating the gradients and applying them at the end of each execution of the pipeline.
outfeed_loss – If True, the loss given by the
optimizer_function
will be enqueued on the outfeed, instead of the outputs from the last computational stage.name – name of this pipeline.
- Returns
An
Operation
that executes the pipeline.
14.12.6. Popnn primitive neural network operators
- tensorflow.python.ipu.nn_ops.gelu(x, name=None)
This targets the PopLibs Popnn gelu operation, optimised for execution on the IPU.
- Parameters
x – The input tensor.
name – Optional op name.
- Returns
A
Tensor
. Has the same type the input tensor.
- tensorflow.python.ipu.nn_ops.multi_conv(func=None, options=None)
A function decorator for generating multi-convolution operations. Multi-convolutions allow for a set of data-independent convolutions to be executed in parallel. Executing convolutions in parallel can lead to an increase in the data throughput.
The
multi_conv
function decorator is a convenient way to generate multi-convolutions - it detects all the convolution operations inside of the decorated function and executes them in parallel.For example:
from tensorflow import keras from tensorflow.python import ipu @ipu.nn_ops.multi_conv def convs(x, y, z): x = keras.layers.DepthwiseConv2D(8, 2, depth_multiplier=2)(x) y = keras.layers.DepthwiseConv2D(16, 4, depth_multiplier=2)(y) z = keras.layers.Conv2D(8, 3)(z) return x, y, z
Will detect and execute the three convolutions
x
,y
andz
in parallel. Note that any operations which are not convolutions, such as bias add operations, will be executed in the same way as if they were not inside of amulti_conv
decorated function.It is also possible to set PopLibs multi-convolution options using this decorator.
For example:
from tensorflow import keras from tensorflow.python import ipu @ipu.nn_ops.multi_conv(options={"perConvReservedTiles":"50"}) def convs(x, y, z): x = keras.layers.DepthwiseConv2D(8, 2, depth_multiplier=2)(x) y = keras.layers.DepthwiseConv2D(16, 4, depth_multiplier=2)(y) z = keras.layers.Conv2D(8, 3)(z) return x, y, z
See the PopLibs documention for the list of all available flags. Note that these options will also be applied to the gradient operations generated during backpropagation.
- Parameters
func – A python function which takes a list of positional arguments only. All the arguments must be
tf.Tensor
-like objects, or be convertible to them. The function provided must return at least onetf.Tensor
-like object.options – A dictionary of Poplar option flags for multi-convolution. See the multi-convolution PopLibs documentation for available flags.
14.12.7. Popnn normalization operators
- tensorflow.python.ipu.normalization_ops.group_norm(inputs, groups=2, channels_axis=- 1, reduction_axes=None, center=True, scale=True, epsilon=1e-06, param_initializers=None, reuse=None, variables_collections=None, training=True, trainable=True, scope=None, strided_channel_grouping=True)
Functional interface for the group normalization layer. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(reduction_axes)
. They will be removed in a future version. Instructions for updating: reduction_axes is deprecated as it has no effect.Reference: https://arxiv.org/abs/1803.08494.
“Group Normalization”, Yuxin Wu, Kaiming He
- Parameters
inputs – A Tensor with at least 2 dimensions one which is channels. All shape dimensions must be fully defined.
groups – Integer. Divide the channels into this number of groups over which normalization statistics are computed. This number must be commensurate with the number of channels in
inputs
.channels_axis – An integer. Specifies index of channels axis which will be broken into
groups
, each of which whose statistics will be computed across. Preferred usage is to specify negative integers to be agnostic as to whether a batch dimension is included.reduction_axes – Deprecated.
center – If True, add offset of
beta
to normalized tensor. If False,beta
is ignored.scale – If True, multiply by
gamma
. If False,gamma
is not used. When the next layer is linear (also e.g.nn.relu
), this can be disabled since the scaling can be done by the next layer.epsilon – Small float added to variance to avoid dividing by zero.
param_initializers – Optional initializers for beta and gamma.
reuse – Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
variables_collections – Optional collections for the variables.
training – Whether this is operation is being used in a training network.
trainable – If
True
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(seetf.Variable
).scope – Optional scope for
variable_scope
.strided_channel_grouping – Selects whether to group the channels dimension for group normalisation with a stride between channels. Enabling this makes the PopLibs implementation more efficient but is unconventional. Among other things this will mean that using pre-trained weights would not be possible if not produced with this unconventional implementation.
- Returns
A
Tensor
representing the output of the operation.- Raises
ValueError – If the rank of
inputs
is undefined.ValueError – If rank or channels dimension of
inputs
is undefined.ValueError – If channels dimension is not 1 or 3.
ValueError – If number of groups is not commensurate with number of channels.
- tensorflow.python.ipu.normalization_ops.instance_norm(inputs, channels_axis=- 1, reduction_axes=None, center=True, scale=True, epsilon=1e-06, param_initializers=None, reuse=None, variables_collections=None, training=True, trainable=True, scope=None)
Functional interface for the instance normalization layer. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(reduction_axes)
. They will be removed in a future version. Instructions for updating: reduction_axes is deprecated as it has no effect.Reference: https://arxiv.org/abs/1607.08022.
“Instance Normalization: The Missing Ingredient for Fast Stylization” Dmitry Ulyanov, Andrea Vedaldi, Victor Lempitsky
Instance normalization will generate normalization statistics across the spatial (X,Y,…) dimensions. Each slice along the feature channels dimension (C) is normalized independently. It is equivalent to a group normalization where the number of groups is the same as the size of the feature channels dimension.
- Parameters
inputs – A Tensor with at least 2 dimensions one which is channels. All shape dimensions must be fully defined.
channels_axis – An integer. Specifies index of channels axis. Preferred usage is to specify negative integers to be agnostic as to whether a batch dimension is included.
reduction_axes – Deprecated.
center – If True, add offset of
beta
to normalized tensor. If False,beta
is ignored.scale – If True, multiply by
gamma
. If False,gamma
is not used. When the next layer is linear (also e.g.nn.relu
), this can be disabled since the scaling can be done by the next layer.epsilon – Small float added to variance to avoid dividing by zero.
param_initializers – Optional initializers for beta and gamma.
reuse – Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
variables_collections – Optional collections for the variables.
training – Whether this is operation is being used in a training network.
trainable – If
True
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(seetf.Variable
).scope – Optional scope for
variable_scope
.
- Returns
A
Tensor
representing the output of the operation.- Raises
ValueError – If
data_format
is neitherNHWC
norNCHW
.ValueError – If the rank of
inputs
is undefined.ValueError – If rank or channels dimension of
inputs
is undefined.
- tensorflow.python.ipu.normalization_ops.layer_norm(inputs, channels_axis=- 1, reduction_axes=None, center=True, scale=True, epsilon=1e-06, param_initializers=None, reuse=None, variables_collections=None, training=True, trainable=True, scope=None)
Adds a Layer Normalization layer. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(reduction_axes)
. They will be removed in a future version. Instructions for updating: reduction_axes is deprecated as it has no effect.Based on the paper:
“Layer Normalization”
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
Layer normalization will generate normalization statistics across the spatial (X,Y,…) dimensions and the feature channels dimension (C). It is equivalent to a group normalization where all of the features in the feature channels dimension are put into a single group.
The shapes of
beta
andgamma
areinputs.shape[begin_params_axis:]
, and this part of the inputs’ shape must be fully defined.- Parameters
inputs – A Tensor with at least 2 dimensions one which is channels. All shape dimensions must be fully defined.
channels_axis – An integer. Specifies index of channels axis. Preferred usage is to specify negative integers to be agnostic as to whether a batch dimension is included.
reduction_axes – Deprecated.
center – If True, add offset of
beta
to normalized tensor. If False,beta
is ignored.scale – If True, multiply by
gamma
. If False,gamma
is not used. When the next layer is linear (also e.g.nn.relu
), this can be disabled since the scaling can be done by the next layer.epsilon – Small float added to variance to avoid dividing by zero.
param_initializers – Optional initializers for beta and gamma.
reuse – Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.
variables_collections – Optional collections for the variables.
training – Whether this is operation is being used in a training network.
trainable – If
True
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(seetf.Variable
).scope – Optional scope for
variable_scope
.
- Returns
A
Tensor
representing the output of the operation, having the same shape and dtype asinputs
.- Raises
ValueError – If the rank of
inputs
is not known at graph build time, or ifinputs.shape[begin_params_axis:]
is not fully defined at graph build time.
14.12.8. Popnn recurrent neural network operators
- class tensorflow.python.ipu.rnn_ops.PopnnGRU(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, name=None)
XLA compatible, time-major Popnn implementation of an GRU layer.
Below is a typical workflow:
with tf.Graph().as_default(): lstm = PopnnGRU(num_units, ...) outputs, output_state = lstm(inputs, initial_state, training=True)
- build(input_shape)
Create variables of the PopnnGRU.
It can be called manually before
__call__()
or automatically through__call__()
. In the former case, any subsequent__call__()
will skip creating variables.- Parameters
input_shape – a TensorShape object with 3 dimensions.
- Raises
ValueError – if input_shape has wrong dimension or unknown 3rd dimension.
- call(inputs, initial_state=None, training=True)
Runs the forward step for the GRU model.
- Parameters
inputs – 3-D tensor with shape [time_len, batch_size, input_size].
initial_state – Initial state tensor, shaped
[batch_size, num_units]
. If not provided, the state is initialized to zeros.training – whether this operation will be used in training or inference.
- Returns
a tensor of shape [time_len, batch_size, num_units]. output_state: The output state of the last cell.
- Return type
output
- Raises
ValueError – if initial_state is not valid.
- state_shape(batch_size)
Shape of Popnn GRU state.
State shape is [batch_size, num_units].
- Parameters
batch_size – an int
- Returns
A python array.
- class tensorflow.python.ipu.rnn_ops.PopnnLSTM(num_units, dtype=tf.float32, partials_dtype=tf.float32, seed=None, weights_initializer=None, bias_initializer=None, name=None)
XLA compatible, time-major Popnn implementation of an LSTM layer.
Below is a typical workflow:
with tf.Graph().as_default(): lstm = PopnnLSTM(num_units, ...) outputs, output_states = lstm(inputs, initial_states, training=True)
- build(input_shape)
Create variables of the PopnnLSTM.
It can be called manually before
__call__()
or automatically through__call__()
. In the former case, any subsequent__call__()
will skip creating variables.- Parameters
input_shape – a TensorShape object with 3 dimensions.
- Raises
ValueError – if input_shape has wrong dimension or unknown 3rd dimension.
- call(inputs, initial_state=None, training=True)
Runs the forward step for the LSTM model.
- Parameters
inputs – 3-D tensor with shape [time_len, batch_size, input_size].
initial_state – An
LSTMStateTuple
of state tensors, each shaped[batch_size, num_units]
. If not provided, the state is initialized to zeros. DEPRECATED a tuple of tensor (input_h_state, input_c_state) each of shape [batch_size, num_units].training – whether this operation will be used in training or inference.
- Returns
output: a tensor of shape [time_len, batch_size, num_units].
- output_states: An
LSTMStateTuple
of the same shape and structure as initial_state. If the initial state used the deprecated behaviour of not passing
LSTMStateTuple
, then a tuple (output_h_state, output_c_state) is returned.
- output_states: An
- Return type
tuple of output and output states
- Raises
ValueError – if initial_state is not valid.
- state_shape(batch_size)
Shape of Popnn LSTM states.
Shape is a 2-element tuple. Each is [batch_size, num_units]
- Parameters
batch_size – an int
- Returns
a tuple of python arrays.
14.12.9. Popops all to all and all gather operators
- tensorflow.python.ipu.all_to_all_op.all_gather(x, replication_factor, name)
Gather the data on all replicas to all other replicas. Each replica will have the exact same output.
- Parameters
x – The tensor to gather
replication_factor – The replication factor of the model.
name – Optional op name.
- Returns
A tensor of [num_replicas][x] with each replica having the same tensor.
- tensorflow.python.ipu.all_to_all_op.all_to_all(x, split_dimension, concat_dimension, replication_factor, name=None)
Perform an XLA all to all operation across all replicas. (See https://www.tensorflow.org/xla/operation_semantics#alltoall)
- Parameters
split_dimension – A value in the interval [0,n) that names the dimension along which the operand is split
concat_dimension – A value in the interval [0,n) that names the dimension along which the split blocks are concatenated.
replication_factor – The replication factor of the model.
name – Optional op name.
- Returns
A tensor of the same size where each replica will have a different value.
14.12.10. Popops cross replica operators
- tensorflow.python.ipu.cross_replica_ops.cross_replica_sum(x, name=None)
Sum the input tensor across replicas.
- Parameters
x – The local tensor to the sum.
name – Optional op name.
- Returns
A
Tensor
which is summed across replicas.
14.12.11. Popops embedding operators
- class tensorflow.python.ipu.embedding_ops.HostEmbedding(name, embedding_tensor, partition_strategy='TOKEN', optimizer_spec=None)
Host Embedding wrapper.
HostEmbedding encapsulates the embedding tensor and the additional meta-data required to coordinate the host embedding and the device lookup. Through an instance of this class, an IPU can perform lookups on an embedding that resides on the host.
It is assumed that the given embedding will be rank two where the outermost dimension (dimension zero) is the token dimension, and the innermost dimension is the encoding dimension.
- __init__(name, embedding_tensor, partition_strategy='TOKEN', optimizer_spec=None)
Create a HostEmbedding.
- Parameters
name – The name which uniquely identifies the embedding.
embedding_tensor – The tensor which holds the embedding.
optimizer_spec – A description of how the embedding will be optimized. When
None
, the embedding is assumed to not be trainable.
- get_embedding_tensor()
Retrieve the CPU bound embedding tensor.
- Returns
The TF CPU tensor for the embedding.
- lookup(indices, count=1, clip_indices=True)
Perform a host embedding lookup on an IPU. (deprecated arguments)
Warning: SOME ARGUMENTS ARE DEPRECATED:
(count)
. They will be removed in a future version. Instructions for updating: This argument no longer has any effect.- Parameters
indices – The indices to lookup.
count – The number of times, per iteration, that this op will be executed.
clip_indices – Whether to enforce a valid range on the lookup indices with clipping. When False, out-of-range values have undefined behaviour.
- Returns
A Tensor containing the elements requested by the user indices.
- register(session=None)
Creates a host embedding context manager bound to the given session.
- Parameters
session – The session to register the embedding to.
- Returns
A Python context manager object. This object manages the lifetime of the host embedding connection to the IPU.
- class tensorflow.python.ipu.embedding_ops.HostEmbeddingOptimizerSpec(learning_rate, optimizer_name=None)
Description of the Host Embedding optimizer.
Despite the embedding living on the host, we want to compute the gradients on the device. Additionally, the communication channel between the device and host is opaque to TensorFlow. For these reasons we need to describe the optimizer parameters separately.
Currently only supports SGD.
- __init__(learning_rate, optimizer_name=None)
Create a HostEmbeddingOptimizerSpec.
- Parameters
learning_rate – The SGD learning rate.
- create_deregister_instruction(embedding_tensor, slot_vars, name)
Create a deregister instruction.
This will be called when exiting the
HostEmbedding
context manager.- Parameters
embedding_tensor – The TF embedding tensor bound to the CPU.
slot_vars – Any created slot variables.
name – The name of the host embedding.
- Returns
The deregister instruction.
- create_lookup_instruction(embedding_tensor, indices, slot_vars, partition_strategy, name)
Create a lookup instruction.
This will be called from the
HostEmbedding
wrapper class.- Parameters
embedding_tensor – The TF embedding tensor bound to the CPU.
indices – The TF indices tensor bound to the IPU.
slot_vars – Any created slot variables.
partition_strategy – The user selected partition strategy.
name – The name of the host embedding.
- Returns
The result of the embedding lookup in an IPU tensor.
- create_register_instruction(embedding_tensor, slot_vars, name)
Create a register instruction.
This will be called when entering the
HostEmbedding
context manager.- Parameters
embedding_tensor – The TF embedding tensor bound to the CPU.
slot_vars – Any created slot variables.
name – The name of the host embedding.
- Returns
The register instruction.
- create_slot_variables(embedding_tensor, name)
Create any required slot variables for this optimiser.
This will be called when exiting the
HostEmbedding
context manager.- Parameters
embedding_tensor – The TF embedding tensor bound to the CPU.
name – The name of the host embedding.
- Returns
A list of TF tensors bound to the CPU.
- get_learning_rate()
Get the optimiser learning rate.
- Returns
The learning rate.
- class tensorflow.python.ipu.embedding_ops.HostEmbeddingSGDGAOptimizerSpec(learning_rate, accumulation_factor)
- __init__(learning_rate, accumulation_factor)
Create a HostEmbeddingOptimizerSpec.
- Parameters
learning_rate – The SGD learning rate.
- tensorflow.python.ipu.embedding_ops.create_host_embedding(name, shape, dtype, partition_strategy='TOKEN', optimizer_spec=None, initializer=None)
Create a HostEmbedding.
- Parameters
name – The name which uniquely identifies the embedding.
shape – The shape for the tensor which will hold the embedding.
dtype – The dtype for the tensor which will hold the embedding.
partition_strategy – When
enable_experimental_remote_buffer_embedding
isTrue
and using replication, the embedding must be distributed across the replicas. This option decides on which axis the embedding will be split. Options are “TOKEN” or “ENCODING”.optimizer_spec – A description of how the embedding will be optimized. When
None
, the embedding is assumed to not be trainable.initializer – The initializer to use when creating the embedding tensor.
- Returns
A
HostEmbedding
object that wraps the created embedding tensor.
- tensorflow.python.ipu.embedding_ops.embedding_lookup(params, ids, name=None, serialization_factor=1, one_hot_threshold=0, min_encoding_size=1216)
Looks up
ids
in a list of embedding tensors. (deprecated arguments)Warning: SOME ARGUMENTS ARE DEPRECATED:
(min_encoding_size, one_hot_threshold)
. They will be removed in a future version. Instructions for updating: stop passing this argument.This is designed to be a drop-in replacement for the typical use cases with
tf.nn.embedding_lookup
for the IPU.- Parameters
params – A single tensor representing the complete embedding tensor.
ids – A
Tensor
with typeint32
containing the slices to be extracted fromparams
.name – A name for the operation.
serialization_factor – If greater than 1, the embedding lookup will be broken up into
serialization_factor
smaller lookups, serialized along the 0th dimension. This option should not be used unlessparams
is used by another operation, such as matrix multiplication. Ifparams
has multiple users, then serialization can reduce the maximum memory at the cost of extra computation.one_hot_threshold – Deprecated.
min_encoding_size – Deprecated.
- Returns
A
Tensor
with the same type as the tensors inparams
.
- tensorflow.python.ipu.embedding_ops.mul()
mul(a, b) – Same as a * b.
- tensorflow.python.ipu.embedding_ops.reduce(function, sequence[, initial]) value
Apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty.
14.12.12. Popops reduce scatter operator
- tensorflow.python.ipu.reduce_scatter_op.reduce_scatter(x, replication_factor, name=None)
Reduce (sum) the given replicated tensor with the result scattered across the replicas. For an input of shape
[num_elements]
, the output will have shape[ceil(num_elements / replication_factor)]
. Ifreplication_factor
does not evenly dividenum_elements
, the result is zero-padded. Example:Input: Replica0: [x0, y0, z0] Replica1: [x1, y1, z1] Output: Replica0: [x0 + x1, y0 + y1] Replica1: [z0 + z1, 0]
- Parameters
x – The input
Tensor
. Must have rank 1.replication_factor – The replication factor of the model.
name – Optional op name.
- Returns
A
Tensor
with the result for this replica.
14.12.13. Poprand operators
- tensorflow.python.ipu.rand_ops.dropout(x, rate=0.5, noise_shape=None, seed=None, name=None)
This targets the PopLibs Poprand operation, optimized for execution on the IPU.
With probability
rate
, drops elements ofx
. Inputs which are kept are scaled up by1 / (1 - rate)
such that the expected sum is unchanged.- Parameters
x – The input tensor.
rate – The probability that a given element will be zeroed out.
noise_shape – An optional parameter that determines the shape of the dropout. Regular, unshaped dropout used if not specified.
seed – An optional two-element tensor-like object (
tf.Tensor
, a numpy array or Python list/tuple), representing the random seed that will be used to create the distribution for dropout.name – Optional op name.
- Returns
A
Tensor
which has some nodes set to zero, as randomly selected based on other parameters.
14.12.14. Utility operations to be used in replicated mode
- tensorflow.python.ipu.replication_ops.replication_index(name=None)
An operation which allows the user to get the replication index.
- Parameters
name – Optional op name.
- Returns
A
Tensor
initialized with the replication index.
14.12.15. Summary operations for IPUs
- tensorflow.python.ipu.summary_ops.ipu_compile_summary(name, op_list, collections=None)
Create an IPU compiler summary operation.
- Parameters
name – A name for the summary.
op_list – An operation or list of operations to make this summary dependent upon.
collections – Optional collections to add the summary into.
- Returns
The new summary operation
- tensorflow.python.ipu.summary_ops.tensor_summary(name, tensor, summary_description=None, collections=None, summary_metadata=None, family=None, display_name=None)
Outputs a
Summary
protocol buffer with a serialized tensor.proto.- Parameters
name – A name for the generated node. If display_name is not set, it will also serve as the tag name in TensorBoard. (In that case, the tag name will inherit tf name scopes.)
tensor – A tensor of any type and shape to serialize.
summary_description – A long description of the summary sequence. Markdown is supported.
collections – Optional list of graph collections keys. The new summary op is added to these collections. Defaults to
[GraphKeys.SUMMARIES]
.summary_metadata – Optional SummaryMetadata proto (which describes which plugins may use the summary value).
family – Optional; if provided, used as the prefix of the summary tag, which controls the name used for display on TensorBoard when display_name is not set.
display_name – A string used to name this data in TensorBoard. If this is not set, then the node name will be used instead.
- Returns
A scalar
Tensor
of typestring
. The serializedSummary
protocol buffer.
14.13. Optimisers
It is also possible to access the optimisers via the
tensorflow.python.ipu.optimizers
namespace, for example:
tensorflow.python.ipu.optimizers.cross_replica_optimizer.CrossReplicaOptimizer
.
14.13.1. Optimizer wrapper for replicated graphs
- class tensorflow.python.ipu.cross_replica_optimizer.CrossReplicaOptimizer(opt, name='CrossReplicaOptimizer')
An optimizer that averages gradients across IPU replicas.
- __init__(opt, name='CrossReplicaOptimizer')
Construct a new cross-replica optimizer.
- Parameters
opt – An existing
Optimizer
to encapsulate.name – Optional name prefix for the operations created when applying gradients. Defaults to “CrossReplicaOptimizer”.
- apply_gradients(grads_and_vars, global_step=None, name=None)
Apply gradients to variables.
Calls popops_cross_replica_sum.cross_replica_sum() to sum gradient contributions across replicas, and then applies the real optimizer.
- Parameters
grads_and_vars – List of (gradient, variable) pairs as returned by compute_gradients().
global_step – Optional Variable to increment by one after the variables have been updated.
name – Optional name for the returned operation. Default to the name passed to the Optimizer constructor.
- Returns
An
Operation
that applies the gradients. Ifglobal_step
was not None, that operation also incrementsglobal_step
.- Raises
ValueError – If the grads_and_vars is malformed.
- compute_gradients(loss, var_list=None, **kwargs)
Compute gradients of “loss” for the variables in “var_list”.
This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping with per replica global norm if needed. The global norm with aggregated gradients can be bad as one replica’s huge gradients can hurt the gradients from other replicas.
- Parameters
loss – A Tensor containing the value to minimize.
var_list – Optional list or tuple of
tf.Variable
to update to minimizeloss
. Defaults to the list of variables collected in the graph under the keyGraphKey.TRAINABLE_VARIABLES
.**kwargs – Keyword arguments for compute_gradients().
- Returns
A list of (gradient, variable) pairs.
- get_slot(*args, **kwargs)
Return a slot named “name” created for “var” by the Optimizer.
This simply wraps the get_slot() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
The
Variable
for the slot if it was created,None
otherwise.
- get_slot_names(*args, **kwargs)
Return a list of the names of slots created by the
Optimizer
.This simply wraps the get_slot_names() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
A list of strings.
- variables()
Forwarding the variables from the underlying optimizer.
14.13.2. Optimizer wrappers which perform local gradient accumulation
- class tensorflow.python.ipu.gradient_accumulation_optimizer.CrossReplicaGradientAccumulationOptimizer(opt, num_mini_batches, verify_usage=True, name='CrossReplicaGradientAccumulationOptimizer')
An optimizer where instead of performing the weight update for every batch, gradients across multiple batches are accumulated. After multiple batches have been processed, their accumulated gradients are then reduced accross the replicas before being used to compute the weight update.
This feature of neural networks allows us to simulate bigger batch sizes. For example if we have a model of batch size 16 and we accumulate the gradients of 4 batches, this simulates an input batch of size 64.
This optimizer is similar to GradientAccumulationOptimizer, however using this optimizer guarantees that the accumulated gradients will only be exchanged between IPUs when the accumulated gradients are back-propagated through the network.
- __init__(opt, num_mini_batches, verify_usage=True, name='CrossReplicaGradientAccumulationOptimizer')
Construct a Cross Replica Gradient Accumulation Optimizer.
- Parameters
opt – An existing
Optimizer
to encapsulate.num_mini_batches – Number of mini-batches the gradients will be accumulated for.
verify_usage – The current gradient accumulation supports the
GradientDescentOptimizer
andMomentumOptimizer
optimizers. Any other usages of this optimizer might results in incorrect results. This option can be used to disable this check.name – Optional name prefix for the operations created when applying gradients. Defaults to “CrossReplicaGradientAccumulationOptimizer”.
- apply_gradients(grads_and_vars, global_step=None, name=None)
Apply gradients to variables.
- Parameters
grads_and_vars – List of (gradient, variable) pairs as returned by compute_gradients().
global_step – Optional Variable to increment by one after the variables have been updated.
name – Optional name for the returned operation. Default to the name passed to the Optimizer constructor.
- Returns
An
Operation
that applies the gradients. Ifglobal_step
was not None, that operation also incrementsglobal_step
.
- compute_gradients(loss, var_list=None, **kwargs)
Compute gradients of “loss” for the variables in “var_list”.
This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping.
- Parameters
loss – A Tensor containing the value to minimize.
var_list – Optional list or tuple of
tf.Variable
to update to minimizeloss
. Defaults to the list of variables collected in the graph under the keyGraphKey.TRAINABLE_VARIABLES
.**kwargs – Keyword arguments for compute_gradients().
- Returns
A list of (gradient, variable) pairs.
- get_slot(*args, **kwargs)
Return a slot named “name” created for “var” by the Optimizer.
This simply wraps the get_slot() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
The
Variable
for the slot if it was created,None
otherwise.
- get_slot_names(*args, **kwargs)
Return a list of the names of slots created by the
Optimizer
.This simply wraps the get_slot_names() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
A list of strings.
- variables()
Forwarding the variables from the underlying optimizer.
- class tensorflow.python.ipu.gradient_accumulation_optimizer.CrossReplicaGradientAccumulationOptimizerV2(opt, num_mini_batches, offload_weight_update_variables=None, replicated_optimizer_state_sharding=False, name='CrossReplicaGradientAccumulationOptimizerV2')
An optimizer where instead of performing the weight update for every batch, gradients across multiple batches are accumulated. After multiple batches have been processed, their accumulated gradients are then reduced accross the replicas before being used to compute the weight update.
This feature of neural networks allows us to simulate bigger batch sizes. For example if we have a model of batch size 16 and we accumulate the gradients of 4 batches, this simulates an input batch of size 64.
This optimizer is similar to GradientAccumulationOptimizerV2, however using this optimizer guarantees that the accumulated gradients will only be exchanged between IPUs when the gradients are applied to the weights, and hence reduces the number of cross-IPU gradient exchanges by a factor of ‘num_mini_batches’.
- __init__(opt, num_mini_batches, offload_weight_update_variables=None, replicated_optimizer_state_sharding=False, name='CrossReplicaGradientAccumulationOptimizerV2')
Construct a Cross Replica Gradient Accumulation Optimizer V2.
- Parameters
opt – An existing
Optimizer
to encapsulate.num_mini_batches – Number of mini-batches the gradients will be accumulated for.
offload_weight_update_variables – If True, any
tf.Variable
which is only used by the weight update of the model (for example the accumulator variable when using thetf.MomentumOptimizer
), will be stored in the remote memory. During the weight update this variable will be streamed onto the device and then streamed back to the remote memory after it has been updated. Requires the machine to be configured with support forPoplar remote buffers
. Offloading variables into remote memory can reduce maximum memory liveness, but can also increase the computation time of the weight update.replicated_optimizer_state_sharding – If True, any any
tf.Variable
which is offloaded will be partitioned across the replicas. A collective all-gather will be inserted to restore the tensor on each replica. IfNone
, this value will match the value ofoffload_weight_update_variables
.name – Optional name prefix for the operations created when applying gradients. Defaults to “CrossReplicaGradientAccumulationOptimizerV2”.
- apply_gradients(*args, **kwargs)
Apply gradients to variables.
- Parameters
*args – Arguments for apply_gradients().
**kwargs – Keyword arguments for apply_gradients().
- Returns
An
Operation
that applies the gradients. Ifglobal_step
was not None, that operation also incrementsglobal_step
.
- compute_gradients(*args, **kwargs)
Compute gradients of “loss” for the variables in “var_list”.
This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping.
- Parameters
*args – Arguments for compute_gradients().
**kwargs – Keyword arguments for compute_gradients().
- Returns
A list of (gradient, variable) pairs.
- get_slot(*args, **kwargs)
Return a slot named “name” created for “var” by the Optimizer.
This simply wraps the get_slot() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
The
Variable
for the slot if it was created,None
otherwise.
- get_slot_names(*args, **kwargs)
Return a list of the names of slots created by the
Optimizer
.This simply wraps the get_slot_names() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
A list of strings.
- variables()
Forwarding the variables from the underlying optimizer.
- class tensorflow.python.ipu.gradient_accumulation_optimizer.GradientAccumulationOptimizer(opt, num_mini_batches, verify_usage=True, name='GradientAccumulationOptimizer')
An optimizer where instead of performing the weight update for every batch, gradients across multiple batches are accumulated. After multiple batches have been processed, their accumulated gradients are used to compute the weight update.
This feature of neural networks allows us to simulate bigger batch sizes. For example if we have a model of batch size 16 and we accumulate the gradients of 4 batches, this simulates an input batch of size 64.
This optimizer supports
tf.train.GradientDescentOptimizer
andtf.train.MomentumOptimizer
only. All other optimizers should useGradientAccumulationOptimizerV2
.- __init__(opt, num_mini_batches, verify_usage=True, name='GradientAccumulationOptimizer')
Construct a Gradient Accumulation Optimizer.
- Parameters
opt – An existing
Optimizer
to encapsulate.num_mini_batches – Number of mini-batches the gradients will be accumulated for.
verify_usage – The current gradient accumulation supports the
GradientDescentOptimizer
andMomentumOptimizer
optimizers. Any other usages of this optimizer might results in incorrect results. This option can be used to disable this check.name – Optional name prefix for the operations created when applying gradients. Defaults to “GradientAccumulationOptimizer”.
- apply_gradients(grads_and_vars, global_step=None, name=None)
Apply gradients to variables.
- Parameters
grads_and_vars – List of (gradient, variable) pairs as returned by compute_gradients().
global_step – Optional Variable to increment by one after the variables have been updated.
name – Optional name for the returned operation. Default to the name passed to the Optimizer constructor.
- Returns
An
Operation
that applies the gradients. Ifglobal_step
was not None, that operation also incrementsglobal_step
.- Raises
ValueError – If the grads_and_vars is malformed.
- compute_gradients(loss, var_list=None, **kwargs)
Compute gradients of “loss” for the variables in “var_list”.
This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping.
- Parameters
loss – A Tensor containing the value to minimize.
var_list – Optional list or tuple of
tf.Variable
to update to minimizeloss
. Defaults to the list of variables collected in the graph under the keyGraphKey.TRAINABLE_VARIABLES
.**kwargs – Keyword arguments for compute_gradients().
- Returns
A list of (gradient, variable) pairs.
- get_slot(*args, **kwargs)
Return a slot named “name” created for “var” by the Optimizer.
This simply wraps the get_slot() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
The
Variable
for the slot if it was created,None
otherwise.
- get_slot_names(*args, **kwargs)
Return a list of the names of slots created by the
Optimizer
.This simply wraps the get_slot_names() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
A list of strings.
- variables()
Forwarding the variables from the underlying optimizer.
- class tensorflow.python.ipu.gradient_accumulation_optimizer.GradientAccumulationOptimizerV2(opt, num_mini_batches, offload_weight_update_variables=None, replicated_optimizer_state_sharding=False, name='GradientAccumulationOptimizerV2')
An optimizer where instead of performing the weight update for every batch, gradients across multiple batches are accumulated. After multiple batches have been processed, their accumulated gradients are used to compute the weight update.
This feature of neural networks allows us to simulate bigger batch sizes. For example if we have a model of batch size 16 and we accumulate the gradients of 4 batches, this simulates an input batch of size 64.
Unlike ‘GradientAccumulationOptimizer’, this optimizer can be used to wrap any other TensorFlow optimizer.
See the Gradient accumulation section in the documention for more details.
- __init__(opt, num_mini_batches, offload_weight_update_variables=None, replicated_optimizer_state_sharding=False, name='GradientAccumulationOptimizerV2')
Construct a Gradient Accumulation Optimizer V2.
- Parameters
opt – An existing
Optimizer
to encapsulate.num_mini_batches – Number of mini-batches the gradients will be accumulated for.
offload_weight_update_variables – When enabled, any
tf.Variable
which is only used by the weight update of the pipeline (for example the accumulator variable when using thetf.MomentumOptimizer
), will be stored in the remote memory. During the weight update this variable will be streamed onto the device and then streamed back to the remote memory after it has been updated. Requires the machine to be configured with support forPoplar remote buffers
. Offloading variables into remote memory can reduce maximum memory liveness, but can also increase the computation time of the weight update. When set toNone
the variables will be placed in either in-processor or remote memory automatically based on the current best placement strategy.replicated_optimizer_state_sharding – If True, any any
tf.Variable
which is offloaded will be partitioned across the replicas. A collective all-gather will be inserted to restore the tensor on each replica. IfNone
, this value will match the value ofoffload_weight_update_variables
.name – Optional name prefix for the operations created when applying gradients. Defaults to “GradientAccumulationOptimizerV2”.
- apply_gradients(grads_and_vars, global_step=None, name=None)
Apply gradients to variables.
- Parameters
grads_and_vars – List of (gradient, variable) pairs as returned by compute_gradients().
global_step – Optional Variable to increment by one after the variables have been updated.
name – Optional name for the returned operation. Default to the name passed to the Optimizer constructor.
- Returns
An
Operation
that applies the gradients. Ifglobal_step
was not None, that operation also incrementsglobal_step
.- Raises
ValueError – If the grads_and_vars is malformed.
- compute_gradients(*args, **kwargs)
Compute gradients of “loss” for the variables in “var_list”.
This simply wraps the compute_gradients() from the real optimizer. The gradients will be aggregated in the apply_gradients() so that user can modify the gradients like clipping.
- Parameters
*args – Arguments for compute_gradients().
**kwargs – Keyword arguments for compute_gradients().
- Returns
A list of (gradient, variable) pairs.
- get_slot(*args, **kwargs)
Return a slot named “name” created for “var” by the Optimizer.
This simply wraps the get_slot() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
The
Variable
for the slot if it was created,None
otherwise.
- get_slot_names(*args, **kwargs)
Return a list of the names of slots created by the
Optimizer
.This simply wraps the get_slot_names() from the actual optimizer.
- Parameters
*args – Arguments for get_slot().
**kwargs – Keyword arguments for get_slot().
- Returns
A list of strings.
- variables()
Forwarding the variables from the underlying optimizer.
14.13.3. Optimizer wrapper that modifies gradients before application
- class tensorflow.python.ipu.map_gradient_optimizer.MapGradientOptimizer(wrapped_optimizer, gradient_mapping_function, name='MapGradientOptimizer')
This class enables modification of the computed gradients, before they are passed to the final optimizer for application.
MapGradientOptimizer needs a map function that will modify the gradients, and an optimizer to which the modified gradients are passed.
The map function has two arguments:
gradient
andvariable
. The map function must return the modified gradient.Example
# Define function which will modify computed gradients. # This is a gradient decay function. def map_fn_decay(grad, var): return grad + (WEIGHT_DECAY * var) # To run the code we need a session: with self.cached_session(): optimizer = gradient_descent.GradientDescentOptimizer(0.000001) # We define MapGradientOptimizer map_optimizer = map_gradient_optimizer.MapGradientOptimizer( optimizer, map_fn_decay) # Gradients are computed by compute_gradients(), where our map function # modifies computed gradients. compute_gradients(loss, var_list) arguments # are loss and var_list so define arguments and call # map_optimizer.compute_gradients(). values = [1.0, 2.0, 3.0] vars_ = [variables.Variable([v], dtype=dtypes.float32) for v in values] grads_and_vars = map_optimizer.compute_gradients( vars_[0] * vars_[1] + vars_[0] * vars_[2] + vars_[1] * vars_[2], vars_) # The output grads_and_vars contains computed gradients modified by # the decay map function. # grads are 5.01, 4.02 and 3.03. If we did not use MapGradientOptimizer # they would be 5, 4 and 3.
- __init__(wrapped_optimizer, gradient_mapping_function, name='MapGradientOptimizer')
Construct a MapGradientOptimizer.
- Parameters
wrapped_optimizer – TensorFlow (derived) optimizer.
gradient_mapping_function – The function to be applied on the gradients and variables which are provided by
wrapped_optimizer.compute_gradients()
.
- apply_gradients(*args, **kwargs)
Apply gradients to variables.
This is the second part of
minimize()
. It returns anOperation
that applies gradients.- Parameters
grads_and_vars – List of (gradient, variable) pairs as returned by
compute_gradients()
.global_step – Optional
Variable
to increment by one after the variables have been updated.name – Optional name for the returned operation. Default to the name passed to the
Optimizer
constructor.
- Returns
An
Operation
that applies the specified gradients. Ifglobal_step
was not None, that operation also incrementsglobal_step
.- Raises
TypeError – If
grads_and_vars
is malformed.ValueError – If none of the variables have gradients.
RuntimeError – If you should use
_distributed_apply()
instead.
- compute_gradients(*args, **kwargs)
Compute gradients of “loss” for the variables in “var_list”.
The gradients computed by the wrapped optimizer are modified using the
gradient_mapping_function
that was passed to the constructor.- Parameters
loss – A Tensor containing the value to minimize.
var_list – Optional list or tuple of
tf.Variable
to update to minimizeloss
. Defaults to the list of variables collected in the graph under the keyGraphKey.TRAINABLE_VARIABLES
.**kwargs – Keyword arguments for compute_gradients().
- Returns
A list of (gradient, variable) pairs.
- get_slot(var, name)
Return a slot named
name
created forvar
by the Optimizer.Some
Optimizer
subclasses use additional variables. For exampleMomentum
andAdagrad
use variables to accumulate updates. This method gives access to theseVariable
objects if for some reason you need them.Use
get_slot_names()
to get the list of slot names created by theOptimizer
.- Parameters
var – A variable passed to
minimize()
orapply_gradients()
.name – A string.
- Returns
The
Variable
for the slot if it was created,None
otherwise.
- get_slot_names()
Return a list of the names of slots created by the
Optimizer
.See
get_slot()
.- Returns
A list of strings.
- variables()
A list of variables which encode the current state of
Optimizer
.Includes slot variables and additional global variables created by the optimizer in the current default graph.
- Returns
A list of variables.
14.13.4. Optimizer wrapper for sharded graphs
- class tensorflow.python.ipu.sharded_optimizer.ShardedOptimizer(optimizer)
- __init__(optimizer)
Construct a new sharded optimizer.
- Parameters
optimizer – The optimizer to wrap.
- apply_gradients(grads_and_vars, global_step=None, name=None)
Apply gradients to variables.
This is the second part of
minimize()
. It returns anOperation
that applies gradients.- Parameters
grads_and_vars – List of (gradient, variable) pairs as returned by
compute_gradients()
.global_step – Optional
Variable
to increment by one after the variables have been updated.name – Optional name for the returned operation. Default to the name passed to the
Optimizer
constructor.
- Returns
An
Operation
that applies the specified gradients. Ifglobal_step
was not None, that operation also incrementsglobal_step
.- Raises
TypeError – If
grads_and_vars
is malformed.ValueError – If none of the variables have gradients.
RuntimeError – If you should use
_distributed_apply()
instead.
- compute_gradients(loss, var_list=None, **kwargs)
Compute gradients of
loss
for the variables invar_list
.This is the first part of
minimize()
. It returns a list of (gradient, variable) pairs where “gradient” is the gradient for “variable”. Note that “gradient” can be aTensor
, anIndexedSlices
, orNone
if there is no gradient for the given variable.- Parameters
loss – A Tensor containing the value to minimize or a callable taking no arguments which returns the value to minimize. When eager execution is enabled it must be a callable.
var_list – Optional list or tuple of
tf.Variable
to update to minimizeloss
. Defaults to the list of variables collected in the graph under the keyGraphKeys.TRAINABLE_VARIABLES
.gate_gradients – How to gate the computation of gradients. Can be
GATE_NONE
,GATE_OP
, orGATE_GRAPH
.aggregation_method – Specifies the method used to combine gradient terms. Valid values are defined in the class
AggregationMethod
.colocate_gradients_with_ops – If True, try colocating gradients with the corresponding op.
grad_loss – Optional. A
Tensor
holding the gradient computed forloss
.
- Returns
A list of (gradient, variable) pairs. Variable is always present, but gradient can be
None
.- Raises
TypeError – If
var_list
contains anything else thanVariable
objects.ValueError – If some arguments are invalid.
RuntimeError – If called with eager execution enabled and
loss
is not callable.
@compatibility(eager) When eager execution is enabled,
gate_gradients
,aggregation_method
, andcolocate_gradients_with_ops
are ignored. @end_compatibility
- get_slot_names(*args, **kwargs)
Return a list of the names of slots created by the
Optimizer
.See
get_slot()
.- Returns
A list of strings.
- variables()
A list of variables which encode the current state of
Optimizer
.Includes slot variables and additional global variables created by the optimizer in the current default graph.
- Returns
A list of variables.
14.14. Sharding
14.14.1. Automatic graph sharding
- tensorflow.python.ipu.autoshard.automatic_sharding(num_shards, input_ts, loss_ts, edge_filter=None, frozen_inference=False)
Automatically set shards for all connected nodes in graph.
- Parameters
num_shards – number of shards to split graph over.
input_ts – tensor closest to the datafeed in graph.
loss_ts – tensor closest to the loss in graph.
edge_filter – a callable predicate, with the signature fn(edge), where edge is a tuple containing the name of the source op and the name of the destination op. If the predicate returns True then the graph will not be split at that edge. Only used if frozen_inference is False.
frozen_inference – Flag set to True if running inference on a frozen graph.
- tensorflow.python.ipu.autoshard.ipu_autoshard()
Provides a context for autosharding. All operations created within this context will be automatically sharded.
14.14.2. Utility functions for sharding graphs
- tensorflow.python.ipu.sharding.dependencies(roots)
Find a list of ancestor operations for a given set of root operations
- Parameters
roots – The root operations from which to start.
- tensorflow.python.ipu.sharding.get_shard_from_colocation(op)
Find the shard number from an op which shares co-location information with the given operation.
- Parameters
op – The operation to apply sharding to.
- tensorflow.python.ipu.sharding.has_attr(o, attr_name)
Test for the presence of a specific attribute.
- Parameters
o – An operation.
attr_name – The name of an attribute to test for.
- Returns
True if the operation has the given attribute.
- tensorflow.python.ipu.sharding.propagate_sharding(g)
Move the sharding from the forward pass operations onto their co-located backward pass operations.
- Parameters
g – The graph.