TorchDataLoader

class Dock2D.Utility.TorchDataLoader.InteractionPoseDataset(path, max_size=None)
__getitem__(index)
Returns

values at index of interaction data

__init__(path, max_size=None)
Parameters
  • path – path to docking dataset .pkl file.

  • max_size – number of docking examples to be loaded into data stream

__len__()
Returns

length of the dataset

class Dock2D.Utility.TorchDataLoader.InteractionFactDataset(path, number_of_pairs=None, randomstate=None)
__getitem__(index)
Returns

values at index of interaction data

__init__(path, number_of_pairs=None, randomstate=None)

Load data from .pkl dataset file. Build datastream from protein pool, interaction indices, and labels. The entire dataset is shuffled.

Parameters
  • path – path to interaction dataset .pkl file.

  • number_of_pairs

    specifies the data stream max_size as number of unique interactions.

    \[\frac{N(N+1)}{2}\]

This is based on N interaction pairs. If N == None, the entire upper triangle plus diagonal of the interaction pairs array are used.

__len__()
Returns

length of the dataset

Dock2D.Utility.TorchDataLoader.get_docking_stream(data_path, shuffle=True, max_size=None, num_workers=0)

Get docking data as a torch data stream that is randomly shuffled per epoch.

Parameters
  • data_path – path to dataset .pkl file.

  • shuffle – shuffle using RandomSampler() or not

  • max_size – number of docking examples to be loaded into data stream

  • num_workers – number of cpu threads

Returns

docking data stream in format of [receptor, ligand, rotation, translation] (see DatasetGeneration).

Dock2D.Utility.TorchDataLoader.get_interaction_stream(data_path, number_of_pairs=None, randomstate=None, num_workers=0)

Get interaction data as a torch data stream, specifying N as number_of_pairs which results in \(\frac{N(N+1)}{2}\) unique interactions. The fact of interaction data stream shuffles examples when selecting number_of_pairs from the entire dataset, as well as shuffles the selected data stream before each epoch.

Note

For a resumable data stream (e.g. for the SampleBuffer in Monte Carlo fact of interaction), set the randomstate ` by specifying a `numpy.random.RandomState(), this will result in a one-time shuffle at data stream initialization that can be maintained across loading saved models.

Parameters
  • data_path – path to dataset .pkl file.

  • number_of_pairs – number of interaction pair examples to be loaded into data stream

  • num_workers – number of cpu threads

Returns

interaction data stream [receptor, ligand, 1 or 0] (see DatasetGeneration).