TorchDataLoader
- class Dock2D.Utility.TorchDataLoader.InteractionPoseDataset(path, max_size=None)
- __getitem__(index)
- Returns
values at index of interaction data
- __init__(path, max_size=None)
- Parameters
path – path to docking dataset .pkl file.
max_size – number of docking examples to be loaded into data stream
- __len__()
- Returns
length of the dataset
- class Dock2D.Utility.TorchDataLoader.InteractionFactDataset(path, number_of_pairs=None, randomstate=None)
- __getitem__(index)
- Returns
values at index of interaction data
- __init__(path, number_of_pairs=None, randomstate=None)
Load data from .pkl dataset file. Build datastream from protein pool, interaction indices, and labels. The entire dataset is shuffled.
- Parameters
path – path to interaction dataset .pkl file.
number_of_pairs –
specifies the data stream max_size as number of unique interactions.
\[\frac{N(N+1)}{2}\]
This is based on N interaction pairs. If N == None, the entire upper triangle plus diagonal of the interaction pairs array are used.
- __len__()
- Returns
length of the dataset
- Dock2D.Utility.TorchDataLoader.get_docking_stream(data_path, shuffle=True, max_size=None, num_workers=0)
Get docking data as a torch data stream that is randomly shuffled per epoch.
- Parameters
data_path – path to dataset .pkl file.
shuffle – shuffle using RandomSampler() or not
max_size – number of docking examples to be loaded into data stream
num_workers – number of cpu threads
- Returns
docking data stream in format of [receptor, ligand, rotation, translation] (see DatasetGeneration).
- Dock2D.Utility.TorchDataLoader.get_interaction_stream(data_path, number_of_pairs=None, randomstate=None, num_workers=0)
Get interaction data as a torch data stream, specifying N as number_of_pairs which results in \(\frac{N(N+1)}{2}\) unique interactions. The fact of interaction data stream shuffles examples when selecting number_of_pairs from the entire dataset, as well as shuffles the selected data stream before each epoch.
Note
For a resumable data stream (e.g. for the SampleBuffer in Monte Carlo fact of interaction), set the randomstate ` by specifying a `numpy.random.RandomState(), this will result in a one-time shuffle at data stream initialization that can be maintained across loading saved models.
- Parameters
data_path – path to dataset .pkl file.
number_of_pairs – number of interaction pair examples to be loaded into data stream
num_workers – number of cpu threads
- Returns
interaction data stream [receptor, ligand, 1 or 0] (see DatasetGeneration).