TorchDataLoader

class Dock2D.Utility.TorchDataLoader.InteractionPoseDataset(path, max_size=None)

__getitem__(index)

Returns: values at index of interaction data

__init__(path, max_size=None)

Parameters

path – path to docking dataset .pkl file.
max_size – number of docking examples to be loaded into data stream

__len__()

Returns: length of the dataset

class Dock2D.Utility.TorchDataLoader.InteractionFactDataset(path, number_of_pairs=None, randomstate=None)

__getitem__(index)

Returns: values at index of interaction data

__init__(path, number_of_pairs=None, randomstate=None)

Load data from .pkl dataset file. Build datastream from protein pool, interaction indices, and labels. The entire dataset is shuffled.

Parameters

path – path to interaction dataset .pkl file.
number_of_pairs –
specifies the data stream max_size as number of unique interactions.

\[\frac{N(N+1)}{2}\]

This is based on N interaction pairs. If N == None, the entire upper triangle plus diagonal of the interaction pairs array are used.

__len__()

Returns: length of the dataset

Dock2D.Utility.TorchDataLoader.get_docking_stream(data_path, shuffle=True, max_size=None, num_workers=0)

Get docking data as a torch data stream that is randomly shuffled per epoch.

Parameters

data_path – path to dataset .pkl file.
shuffle – shuffle using RandomSampler() or not
max_size – number of docking examples to be loaded into data stream
num_workers – number of cpu threads

Returns

docking data stream in format of [receptor, ligand, rotation, translation] (see DatasetGeneration).

Dock2D.Utility.TorchDataLoader.get_interaction_stream(data_path, number_of_pairs=None, randomstate=None, num_workers=0)

Get interaction data as a torch data stream, specifying N as number_of_pairs which results in \(\frac{N(N+1)}{2}\) unique interactions. The fact of interaction data stream shuffles examples when selecting number_of_pairs from the entire dataset, as well as shuffles the selected data stream before each epoch.

Note

For a resumable data stream (e.g. for the SampleBuffer in Monte Carlo fact of interaction), set the randomstate ` by specifying a `numpy.random.RandomState(), this will result in a one-time shuffle at data stream initialization that can be maintained across loading saved models.

Parameters

data_path – path to dataset .pkl file.
number_of_pairs – number of interaction pair examples to be loaded into data stream
num_workers – number of cpu threads

Returns

interaction data stream [receptor, ligand, 1 or 0] (see DatasetGeneration).