6. API reference
6.1. Data loaders
- class poptorch_geometric.dataloader.DataLoader(dataset, batch_size=1, shuffle=False, follow_batch=None, exclude_keys=None, options=None, **kwargs)
A data loader which merges data objects from a
torch_geometric.data.Dataset
to a mini-batch. Data objects can be either of typeData
orHeteroData
.- Parameters
dataset (Dataset) – The dataset from which to load the data.
batch_size (int, optional) – How many samples per batch to load. (default:
1
)shuffle (bool, optional) – If set to
True
, the data will be reshuffled at every epoch. (default:False
)follow_batch (List[str], optional) – Creates assignment batch vectors for each key in the list. (default:
None
)exclude_keys (List[str], optional) – Will exclude each key in the list. (default:
None
)options (poptorch.Options, optional) – The additional PopTorch options to be passed to
poptorch.DataLoader
. (default:None
)**kwargs (optional) – Additional arguments of
poptorch.DataLoader
.
- class poptorch_geometric.dataloader.FixedSizeDataLoader(dataset, batch_size=2, shuffle=False, fixed_size_options=None, fixed_size_strategy=FixedSizeStrategy.PadToMax, over_size_strategy=OverSizeStrategy.Error, add_pad_masks=True, follow_batch=None, exclude_keys=None, options=None, **kwargs)
A data loader which merges data objects from
poptorch.Dataset
into a mini-batch and pads node and edge features so tensors across all mini-batches have the same shapes.Data objects can be either of type
Data
orHeteroData
.- Parameters
dataset (Dataset) – The
Dataset
instance from which to load the graph samples.batch_size (int, optional) – The number of graph samples to load in each mini-batch. This should be at least
2
to allow for creating at least one padding graph. (default:2
)shuffle (bool, optional) – If
True
, the data will be reshuffled at every epoch. (default:False
)fixed_size_options (FixedSizeOptions, optional) – A
poptorch_geometric.fixed_size_options.FixedSizeOptions
object which holds the maximum number of nodes, edges and other options required to pad the mini-batches, produced by the data loader, to a fixed size. If not specified, this will be determined from the provided dataset. (default:None
)fixed_size_strategy (FixedSizeStrategy, optional) – The strategy to use to achieve fixed-size mini-batches. By default, each mini-batch will contain a fixed number of real graphs (
batch_size
- 1) plus one single graph for padding. (default:poptorch_geometric.FixedSizeStrategy.PadToMax
)over_size_strategy (OverSizeStrategy, optional) – The behaviour if a sample cannot fit in the fixed-size mini-batch. By default, if the required number of samples cannot fit into the fixed-sized batch an error will be raised. (default:
poptorch_geometric.OverSizeStrategy.Error
)add_pad_masks (bool, optional) –
If
True
, mask objects are attached to mini-batch result. They represents three levels of padding:graphs_mask
- graph level masknodes_mask
- node level maskedges_mask
- edge level mask
Mask objects indicate which elements in the mini-batch are real (represented by
True
) and which were added as padding (represented byFalse
). (default:True
)follow_batch (list or tuple, optional) – Creates assignment batch vectors for each key in the list. (default:
None
)exclude_keys (list or tuple, optional) – Keys to exclude from the batch. (default:
None
)options (poptorch.Options, optional) – The additional PopTorch options to be passed to
poptorch.DataLoader
. (default:None
)**kwargs (optional) – Additional arguments of
poptorch.DataLoader
.
- class poptorch_geometric.pyg_dataloader.FixedSizeStrategy(value)
Specify the strategy to use to achieve fixed-size mini-batches.
PadToMax
: Each mini-batch will contain a fixed number of real graphs plus one single graph for padding.StreamPack
: If the next sample to batch can fit in the mini-batch it will be added. This results in mini-batches with a varied number of real graphs, but minimises the amount of wasted space in a mini-batch due to padding.
- class poptorch_geometric.pyg_dataloader.OverSizeStrategy(value)
Specify the behaviour if a sample cannot fit in the fixed-size mini-batch.
Error
: If the required number of samples cannot fit into a mini-batch, an error will be thrown.Skip
: If the required number of samples cannot fit into a mini-batch, the samples that cannot fit will be skipped.TrimNodes
: If the required number of samples cannot fit into a mini-batch, the samples will still be added and then nodes will be removed from the mini-batch to achieve the fixed size. Enabling this can cause a loss of information in the samples of the mini-batch.TrimEdges
: If the required number of samples cannot fit into a mini-batch, the samples will still be added and then edges will be removed from the mini-batch to achieve the fixed size. Enabling this can cause a loss of information in the samples of the mini-batch.TrimNodesAndEdges
: If the required number of samples cannot fit into a mini-batch, the samples will still be added and then both nodes and edges will be removed from the mini-batch to achieve the fixed size. Enabling this can cause a loss of information in the samples of the mini-batch.
6.2. Cluster data loaders
- class poptorch_geometric.cluster_loader.FixedSizeClusterLoader(cluster_data, fixed_size_options, batch_size=1, over_size_strategy=OverSizeStrategy.TrimNodesAndEdges, add_pad_masks=True, options=None, **kwargs)
A data loader which merges data objects from a
torch_geometric.loader.ClusterData
to a mini-batch of clusters and pads node and edge features so tensors across all batches have constant shapes.- Parameters
cluster_data (ClusterData) – The cluster from which to load the data.
fixed_size_options (FixedSizeOptions, optional) – A
poptorch_geometric.fixed_size_options.FixedSizeOptions
object which holds the maximum number of nodes, edges and other options required to pad the mini-batches, produced by the data loader, to a fixed size.batch_size (int, optional) – The number of nodes per mini-batch to load. (default:
1
)over_size_strategy (OverSizeStrategy, optional) – The behaviour if a sample cannot fit in the fixed-size mini-batch. By default, if the required number of samples cannot fit into the fixed-sized mini-batch, nodes and edges will be removed from the mini-batch to achieve the specified fixed size. (default:
poptorch_geometric.OverSizeStrategy.TrimNodesAndEdges
)add_pad_masks (bool, optional) –
If
True
, mask objects are attached to mini-batch result. They represents three levels of padding:graphs_mask
- graph level masknodes_mask
- node level maskedges_mask
- edge level mask
Mask objects indicate which elements in the mini-batch are real (represented by
True
) and which were added as padding (represented byFalse
). (default:True
)options (poptorch.Options, optional) – The additional PopTorch options to be passed to
poptorch.DataLoader
. (default:None
)**kwargs (optional) – The additional arguments of
poptorch.DataLoader
.
6.3. Collators
- class poptorch_geometric.collate.FixedSizeCollater(fixed_size_options, add_masks_to_batch=False, trim_nodes=False, trim_edges=False, follow_batch=None, exclude_keys=None)
Collates a batch of graphs as a
torch_geometric.data.Batch
of fixed-size tensors.Calling an instance of this class adds an additional graphs with the necessary number of nodes and edges to pad the batch so that tensors have the size corresponding to the maximum numbers of graphs, nodes and edges specified during initialisation.
Calling an instance of this class can result in
RuntimeError
if the number of graphs (if set), nodes or edges in the batch is larger than the requested limits.- Parameters
fixed_size_options (FixedSizeOptions, optional) – A
poptorch_geometric.fixed_size_options.FixedSizeOptions
object which holds the maximum number of nodes, edges and other options required to pad the batches, produced by collater, to a fixed size.add_masks_to_batch (bool, optional) –
If set to
True
, masks object are attached to batch result. They represents three levels of padding:graphs_mask
- graph level masknodes_mask
- node level maskedges_mask
- edge level mask
Mask objects indicates which elements in the batch are real (represented by
True
value) and which were added as a padding (represented byFalse
value). (default:False
)trim_nodes (bool, optional) – If set to
True
, randomly prune nodes from batch to fulfill the condition ofnum_nodes
. (default:False
)trim_edges (bool, optional) – If set to
True
, randomly prune edges from batch to fulfill the condition ofnum_edges
. (default:False
)follow_batch (list or tuple, optional) – Creates assignment batch vectors for each key in the list. (default:
None
)exclude_keys (list or tuple, optional) – The keys to exclude from the graphs in the output batch. (default:
None
)
- Return type
None
- class LabelsType(value)
An enumeration.
6.4. Batch samplers
- class poptorch_geometric.stream_packing_sampler.StreamPackingSampler(data_source, max_num_graphs, max_num_nodes=None, max_num_edges=None, base_sampler=None, allow_skip_data=False)
Wraps a sampler to generate a mini-batch of graphs with potentially varying batch sizes.
StreamPackingSampler
creates batches by adding one graph at a time to the batch one at a time without exceeding the maximum number of nodes, edges, or graphs. This gives similar results to packing without requiring the dataset to be preprocessed.- Parameters
data_source (torch_geometric.data.Dataset) – The data source to process.
max_num_graphs (int) – The maximum number of graphs to include in a batch.
max_num_nodes (int, optional) – The maximum number of nodes allowed in a batch. (default:
None
)max_num_edges (int, optional) – The maximum number of edges allowed in a batch. (default:
None
)base_sampler (Sampler or Iterable, optional) – The base sampler used to sample the graphs before packing them into a batch. This can be any iterable object. (default: SequentialSampler(data_source))
allow_skip_data (bool, optional) – If true, allows for a skip
data_source
item to be skipped. Otherwise, aRuntimeError
will be thrown when the sampler is not able to form a single item batch fromdata_source
, because the iterated data exceeds the maximum batch requirements. (defaultFalse
)
6.5. Fixed size options
- class poptorch_geometric.fixed_size_options.FixedSizeOptions(num_nodes, num_edges=None, num_graphs=2, node_pad_value=None, edge_pad_value=None, graph_pad_value=None, pad_graph_defaults=None)
Class that holds the specification of how a data loader can be padded up to a fixed size. This includes the number of nodes and edges to pad a batch, produced using this specification, to a maximum number.
- Parameters
num_nodes (int or dict) – The number of nodes after padding a batch. In heterogeneous graphs, this can be a dictionary denoting the number of nodes for specific node types.
num_edges (int or dict, optional) – The number of edges after padding a batch. In heterogeneous graphs, this can be a dictionary denoting the number of edges for specific edge types. (default:
num_nodes * (num_nodes - 1)
)num_graphs (int, optional) – The total number of graphs in the padded batch. This should be at least
2
to allow for creating at least one padding graph. The default value is2
accounting for a single real graph and a single padded graph in a batch. (default:2
)node_pad_value (float, optional) – The fill value to use for node features. (default:
0.0
)edge_pad_value (float, optional) – The fill value to use for edge features. (default:
0.0
)graph_pad_value (float, optional) – The fill value to use for graph features. (default:
0.0
)pad_graph_defaults (dict, optional) – The default values that will be assigned to the keys of types different to
torch.Tensor
in the newly created padding graphs. (default:None
)
- classmethod from_dataset(dataset, batch_size, sample_limit=None, progress_bar=None)
Returns a
FixedSizeOptions
object which is a valid set of options for the given dataset, ensuring that the number of nodes and edges allocated are enough for the dataset given a particular batch size.
- classmethod from_loader(loader, sample_limit=1000)
Returns a
FixedSizeOptions
object which is a valid set of options for the given data loader, ensuring that the number of nodes and edges allocated are approximately enough for the mini-batches produced by this data loader. As the underlying loader is unlikely to produce an exhaustive combination of samples in a mini-batch, theFixedSizeOptions
returned can only be an approximation of the maximum values required.- Parameters
loader (DataLoader) –
sample_limit (int) –
- is_hetero()
Returns whether the specified number of nodes and edges are in heterogeneous form, ie a number for each node and edge type.
- to_hetero(node_types, edge_types)
Converts a single specified number of nodes and edges to a heterogeneous form, a number for each node and edge type.
- property total_num_edges
The total number of nodes summed for all the edge types.
- property total_num_nodes
The total number of nodes summed for all the node types.