check_shape_distributions.py

class Dock2D.Tests.check_shape_distributions.ShapeDistributions(protein_pool, dataset_name, show=False)
__init__(protein_pool, dataset_name, show=False)

Initialize checks for generated protein pool.

Parameters
  • protein_pool – protein pool filename.pkl

  • dataset_name – data set name

  • show – show plots (does not affect saving)

check_missing_examples(combination_list, found_list, protein_shapes, params_list)

Check for unrepresented combinations of parameters and regenerate them purely for plotting purposes. For example, given a small protein pool and a broad distribtution of parameters, the tail combination of parameters might never be encountered and saved to the protein pool.

Note

This is done just to get an idea of what the desired shape distribution might look like, but the user must either increase the protein pool size or increase relative probabilities of each parameter missing. Increasing protein pool size is the simplist solution.

Parameters
  • combination_list – parameter combinations expected to encounter

  • found_list – parameter combinations encountered

  • protein_shapes – protein shapes extracted from protein pool file

  • params_list – list of parameters extracted from protein pool file

Returns

indices

get_counts(counts)

Used to get unique, sorted parameters.

Parameters

counts – unique counts of current parameter

Returns

unique, counts

get_dict_counts(shape_params)

Initial counts of shape generating parameters used from current protein pool

Parameters

shape_params – list of shape parameters in order

Returns

alpha_counts, numpoints_counts, params_list

get_shape_distributions(data, debug=False)

Parse protein pool file and get shape distributions based on parameter combinations.

Parameters

data – loaded protein pool .pkl

Returns

shapes_plot, alphas_packed, numpoints_packed

get_unique_fracs(counts, dataname)

Fraction of total counts each unique parameter represents in current protein pool

Parameters
  • counts – counts of each current parameter

  • dataname – name of protein pool

Returns

unique, fracs, barwidth

plot_shapes_and_params(plot_pub=False, debug=False)

Plot a 2D array of example shapes generated using all desired alpha and num_points parameter combinations and plot a two histograms opposite the axes corresponding to each parameter.