6. API reference

6.1. Data loaders

class poptorch_geometric.dataloader.DataLoader(dataset, batch_size=1, shuffle=False, follow_batch=None, exclude_keys=None, options=None, **kwargs)

A data loader which merges data objects from a torch_geometric.data.Dataset to a mini-batch. Data objects can be either of type Data or HeteroData.

Parameters
  • dataset (Dataset) – The dataset from which to load the data.

  • batch_size (int, optional) – How many samples per batch to load. (default: 1)

  • shuffle (bool, optional) – If set to True, the data will be reshuffled at every epoch. (default: False)

  • follow_batch (List[str], optional) – Creates assignment batch vectors for each key in the list. (default: None)

  • exclude_keys (List[str], optional) – Will exclude each key in the list. (default: None)

  • options (poptorch.Options, optional) – The additional PopTorch options to be passed to poptorch.DataLoader. (default: None)

  • **kwargs (optional) – Additional arguments of poptorch.DataLoader.

class poptorch_geometric.dataloader.FixedSizeDataLoader(dataset, batch_size=2, shuffle=False, fixed_size_options=None, fixed_size_strategy=FixedSizeStrategy.PadToMax, over_size_strategy=OverSizeStrategy.Error, add_pad_masks=True, follow_batch=None, exclude_keys=None, options=None, **kwargs)

A data loader which merges data objects from poptorch.Dataset into a mini-batch and pads node and edge features so tensors across all mini-batches have the same shapes.

Data objects can be either of type Data or HeteroData.

Parameters
  • dataset (Dataset) – The Dataset instance from which to load the graph samples.

  • batch_size (int, optional) – The number of graph samples to load in each mini-batch. This should be at least 2 to allow for creating at least one padding graph. (default: 2)

  • shuffle (bool, optional) – If True, the data will be reshuffled at every epoch. (default: False)

  • fixed_size_options (FixedSizeOptions, optional) – A poptorch_geometric.fixed_size_options.FixedSizeOptions object which holds the maximum number of nodes, edges and other options required to pad the mini-batches, produced by the data loader, to a fixed size. If not specified, this will be determined from the provided dataset. (default: None)

  • fixed_size_strategy (FixedSizeStrategy, optional) – The strategy to use to achieve fixed-size mini-batches. By default, each mini-batch will contain a fixed number of real graphs (batch_size - 1) plus one single graph for padding. (default: poptorch_geometric.FixedSizeStrategy.PadToMax)

  • over_size_strategy (OverSizeStrategy, optional) – The behaviour if a sample cannot fit in the fixed-size mini-batch. By default, if the required number of samples cannot fit into the fixed-sized batch an error will be raised. (default: poptorch_geometric.OverSizeStrategy.Error)

  • add_pad_masks (bool, optional) –

    If True, mask objects are attached to mini-batch result. They represents three levels of padding:

    • graphs_mask - graph level mask

    • nodes_mask - node level mask

    • edges_mask - edge level mask

    Mask objects indicate which elements in the mini-batch are real (represented by True) and which were added as padding (represented by False). (default: True)

  • follow_batch (list or tuple, optional) – Creates assignment batch vectors for each key in the list. (default: None)

  • exclude_keys (list or tuple, optional) – Keys to exclude from the batch. (default: None)

  • options (poptorch.Options, optional) – The additional PopTorch options to be passed to poptorch.DataLoader. (default: None)

  • **kwargs (optional) – Additional arguments of poptorch.DataLoader.

class poptorch_geometric.pyg_dataloader.FixedSizeStrategy(value)

Specify the strategy to use to achieve fixed-size mini-batches.

  • PadToMax: Each mini-batch will contain a fixed number of real graphs plus one single graph for padding.

  • StreamPack: If the next sample to batch can fit in the mini-batch it will be added. This results in mini-batches with a varied number of real graphs, but minimises the amount of wasted space in a mini-batch due to padding.

class poptorch_geometric.pyg_dataloader.OverSizeStrategy(value)

Specify the behaviour if a sample cannot fit in the fixed-size mini-batch.

  • Error: If the required number of samples cannot fit into a mini-batch, an error will be thrown.

  • Skip: If the required number of samples cannot fit into a mini-batch, the samples that cannot fit will be skipped.

  • TrimNodes: If the required number of samples cannot fit into a mini-batch, the samples will still be added and then nodes will be removed from the mini-batch to achieve the fixed size. Enabling this can cause a loss of information in the samples of the mini-batch.

  • TrimEdges: If the required number of samples cannot fit into a mini-batch, the samples will still be added and then edges will be removed from the mini-batch to achieve the fixed size. Enabling this can cause a loss of information in the samples of the mini-batch.

  • TrimNodesAndEdges: If the required number of samples cannot fit into a mini-batch, the samples will still be added and then both nodes and edges will be removed from the mini-batch to achieve the fixed size. Enabling this can cause a loss of information in the samples of the mini-batch.

6.2. Cluster data loaders

class poptorch_geometric.cluster_loader.FixedSizeClusterLoader(cluster_data, fixed_size_options, batch_size=1, over_size_strategy=OverSizeStrategy.TrimNodesAndEdges, add_pad_masks=True, options=None, **kwargs)

A data loader which merges data objects from a torch_geometric.loader.ClusterData to a mini-batch of clusters and pads node and edge features so tensors across all batches have constant shapes.

Parameters
  • cluster_data (ClusterData) – The cluster from which to load the data.

  • fixed_size_options (FixedSizeOptions, optional) – A poptorch_geometric.fixed_size_options.FixedSizeOptions object which holds the maximum number of nodes, edges and other options required to pad the mini-batches, produced by the data loader, to a fixed size.

  • batch_size (int, optional) – The number of nodes per mini-batch to load. (default: 1)

  • over_size_strategy (OverSizeStrategy, optional) – The behaviour if a sample cannot fit in the fixed-size mini-batch. By default, if the required number of samples cannot fit into the fixed-sized mini-batch, nodes and edges will be removed from the mini-batch to achieve the specified fixed size. (default: poptorch_geometric.OverSizeStrategy.TrimNodesAndEdges)

  • add_pad_masks (bool, optional) –

    If True, mask objects are attached to mini-batch result. They represents three levels of padding:

    • graphs_mask - graph level mask

    • nodes_mask - node level mask

    • edges_mask - edge level mask

    Mask objects indicate which elements in the mini-batch are real (represented by True) and which were added as padding (represented by False). (default: True)

  • options (poptorch.Options, optional) – The additional PopTorch options to be passed to poptorch.DataLoader. (default: None)

  • **kwargs (optional) – The additional arguments of poptorch.DataLoader.

6.3. Collators

class poptorch_geometric.collate.FixedSizeCollater(fixed_size_options, add_masks_to_batch=False, trim_nodes=False, trim_edges=False, follow_batch=None, exclude_keys=None)

Collates a batch of graphs as a torch_geometric.data.Batch of fixed-size tensors.

Calling an instance of this class adds an additional graphs with the necessary number of nodes and edges to pad the batch so that tensors have the size corresponding to the maximum numbers of graphs, nodes and edges specified during initialisation.

Calling an instance of this class can result in RuntimeError if the number of graphs (if set), nodes or edges in the batch is larger than the requested limits.

Parameters
  • fixed_size_options (FixedSizeOptions, optional) – A poptorch_geometric.fixed_size_options.FixedSizeOptions object which holds the maximum number of nodes, edges and other options required to pad the batches, produced by collater, to a fixed size.

  • add_masks_to_batch (bool, optional) –

    If set to True, masks object are attached to batch result. They represents three levels of padding:

    • graphs_mask - graph level mask

    • nodes_mask - node level mask

    • edges_mask - edge level mask

    Mask objects indicates which elements in the batch are real (represented by True value) and which were added as a padding (represented by False value). (default: False)

  • trim_nodes (bool, optional) – If set to True, randomly prune nodes from batch to fulfill the condition of num_nodes. (default: False)

  • trim_edges (bool, optional) – If set to True, randomly prune edges from batch to fulfill the condition of num_edges. (default: False)

  • follow_batch (list or tuple, optional) – Creates assignment batch vectors for each key in the list. (default: None)

  • exclude_keys (list or tuple, optional) – The keys to exclude from the graphs in the output batch. (default: None)

Return type

None

class LabelsType(value)

An enumeration.

6.4. Batch samplers

class poptorch_geometric.stream_packing_sampler.StreamPackingSampler(data_source, max_num_graphs, max_num_nodes=None, max_num_edges=None, base_sampler=None, allow_skip_data=False)

Wraps a sampler to generate a mini-batch of graphs with potentially varying batch sizes. StreamPackingSampler creates batches by adding one graph at a time to the batch one at a time without exceeding the maximum number of nodes, edges, or graphs. This gives similar results to packing without requiring the dataset to be preprocessed.

Parameters
  • data_source (torch_geometric.data.Dataset) – The data source to process.

  • max_num_graphs (int) – The maximum number of graphs to include in a batch.

  • max_num_nodes (int, optional) – The maximum number of nodes allowed in a batch. (default: None)

  • max_num_edges (int, optional) – The maximum number of edges allowed in a batch. (default: None)

  • base_sampler (Sampler or Iterable, optional) – The base sampler used to sample the graphs before packing them into a batch. This can be any iterable object. (default: SequentialSampler(data_source))

  • allow_skip_data (bool, optional) – If true, allows for a skip data_source item to be skipped. Otherwise, a RuntimeError will be thrown when the sampler is not able to form a single item batch from data_source, because the iterated data exceeds the maximum batch requirements. (default False)

6.5. Fixed size options

class poptorch_geometric.fixed_size_options.FixedSizeOptions(num_nodes, num_edges=None, num_graphs=2, node_pad_value=None, edge_pad_value=None, graph_pad_value=None, pad_graph_defaults=None)

Class that holds the specification of how a data loader can be padded up to a fixed size. This includes the number of nodes and edges to pad a batch, produced using this specification, to a maximum number.

Parameters
  • num_nodes (int or dict) – The number of nodes after padding a batch. In heterogeneous graphs, this can be a dictionary denoting the number of nodes for specific node types.

  • num_edges (int or dict, optional) – The number of edges after padding a batch. In heterogeneous graphs, this can be a dictionary denoting the number of edges for specific edge types. (default: num_nodes * (num_nodes - 1))

  • num_graphs (int, optional) – The total number of graphs in the padded batch. This should be at least 2 to allow for creating at least one padding graph. The default value is 2 accounting for a single real graph and a single padded graph in a batch. (default: 2)

  • node_pad_value (float, optional) – The fill value to use for node features. (default: 0.0)

  • edge_pad_value (float, optional) – The fill value to use for edge features. (default: 0.0)

  • graph_pad_value (float, optional) – The fill value to use for graph features. (default: 0.0)

  • pad_graph_defaults (dict, optional) – The default values that will be assigned to the keys of types different to torch.Tensor in the newly created padding graphs. (default: None)

classmethod from_dataset(dataset, batch_size, sample_limit=None, progress_bar=None)

Returns a FixedSizeOptions object which is a valid set of options for the given dataset, ensuring that the number of nodes and edges allocated are enough for the dataset given a particular batch size.

Parameters
classmethod from_loader(loader, sample_limit=1000)

Returns a FixedSizeOptions object which is a valid set of options for the given data loader, ensuring that the number of nodes and edges allocated are approximately enough for the mini-batches produced by this data loader. As the underlying loader is unlikely to produce an exhaustive combination of samples in a mini-batch, the FixedSizeOptions returned can only be an approximation of the maximum values required.

Parameters
is_hetero()

Returns whether the specified number of nodes and edges are in heterogeneous form, ie a number for each node and edge type.

to_hetero(node_types, edge_types)

Converts a single specified number of nodes and edges to a heterogeneous form, a number for each node and edge type.

Parameters
property total_num_edges

The total number of nodes summed for all the edge types.

property total_num_nodes

The total number of nodes summed for all the node types.