Dataloader¶

src.dataloader.create_dataloaders(batchsize=1024, img_size=128, path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/distributed-malaria-detection/checkouts/latest/src/../data/Classification'), random_background=False, num_workers=0, percentage_of_dataset=numpy.array, balance=numpy.array)¶

Convenience function to setup dataloaders for experiments

Parameters

batchsize (int) – size of batch
img_size (int) – size of image - underlying assumption of square images
path (str) – path to folders containing images
random_background (bool) – whether to randomize uniform backgrounds or not
num_workers (int) – number of processes used to move data from RAM to GPU memory
percentage_of_dataset (tuple) – percentages of each split
balance (list) – list of lists containing the wanted probabilities of each class in the given datasets

Returns

torch.utils.dataloader objects

src.dataloader.create_dataset(path, data_augmentation)¶

Convenience function for this project

Parameters

path (str) – str path to root directory containing folders with samples for each class
data_augmentation – torch transforms object

Returns

torch dataset

src.dataloader.get_data_augmentation(random_background, img_size, dark_background=True)¶

Standard composed transformations for data augmentation

Parameters

random_background (bool) – Randomize background
img_size (int) – Scales input images to square img_size
dark_background (bool) – True - background has val 0 otherwise val 255

Returns

data_augmentation

src.dataloader.get_labels_and_class_counts(labels_list)¶

Calculates the counts of all unique classes.

Parameters: labels_list (list|ndarray) – list or ndarray with labels
Returns

src.dataloader.randomize_background(x, dark_background=True)¶

Since images in the malaria datasets have default backgrounds around the cells and LIME indicates that NNs use the backgrounds shape to make predictions

Parameters

x (torch.tensor) – input image
dark_background (bool) – Bool value of background True - 0,False - 255

Returns

Image with ranomized input

src.dataloader.resample(target_list, imbal_class_prop)¶

Function adapted from ptrblck’s PyTorch fork https://github.com/ptrblck/tutorials/blob/imbalanced_tutorial/intermediate_source/imbalanced_data_tutorial.py#L297 Resample the indices to create an artificially imbalanced dataset.

Parameters

target_list (list) – labels
imbal_class_prop (list) – list of list containing the desired class distributions

Returns

indices to satisfy the probabilities given in imbal_class_prop

src.dataloader.set_prop_dataset(datasets, targets, balance)¶

Creates datasets with a given balance of classes

Parameters

datasets – subsets of datasets
targets – labels of originial e.g. not split dataset
balance – list of lists containing the wanted probabilities of each class in the given datasets

Returns

modified datasets

src.dataloader.split_dataset(dataset, percentage_of_dataset)¶

Separate dataset into parts with percentages specified in percentage_of_dataset

Parameters

dataset – torch Dataset to be split
percentage_of_dataset (list) – list or numpy array with percentage of each split

Returns

torch subsets