Dataloader¶
-
src.dataloader.create_dataloaders(batchsize=1024, img_size=128, path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/distributed-malaria-detection/checkouts/latest/src/../data/Classification'), random_background=False, num_workers=0, percentage_of_dataset=numpy.array, balance=numpy.array)¶ Convenience function to setup dataloaders for experiments
- Parameters
batchsize (int) – size of batch
img_size (int) – size of image - underlying assumption of square images
path (str) – path to folders containing images
random_background (bool) – whether to randomize uniform backgrounds or not
num_workers (int) – number of processes used to move data from RAM to GPU memory
percentage_of_dataset (tuple) – percentages of each split
balance (list) – list of lists containing the wanted probabilities of each class in the given datasets
- Returns
torch.utils.dataloader objects
-
src.dataloader.create_dataset(path, data_augmentation)¶ Convenience function for this project
- Parameters
path (str) – str path to root directory containing folders with samples for each class
data_augmentation – torch transforms object
- Returns
torch dataset
-
src.dataloader.get_data_augmentation(random_background, img_size, dark_background=True)¶ Standard composed transformations for data augmentation
- Parameters
random_background (bool) – Randomize background
img_size (int) – Scales input images to square img_size
dark_background (bool) – True - background has val 0 otherwise val 255
- Returns
data_augmentation
-
src.dataloader.get_labels_and_class_counts(labels_list)¶ Calculates the counts of all unique classes.
- Parameters
labels_list (list|ndarray) – list or ndarray with labels
- Returns
-
src.dataloader.randomize_background(x, dark_background=True)¶ Since images in the malaria datasets have default backgrounds around the cells and LIME indicates that NNs use the backgrounds shape to make predictions
- Parameters
x (torch.tensor) – input image
dark_background (bool) – Bool value of background True - 0,False - 255
- Returns
Image with ranomized input
-
src.dataloader.resample(target_list, imbal_class_prop)¶ Function adapted from ptrblck’s PyTorch fork https://github.com/ptrblck/tutorials/blob/imbalanced_tutorial/intermediate_source/imbalanced_data_tutorial.py#L297 Resample the indices to create an artificially imbalanced dataset.
- Parameters
target_list (list) – labels
imbal_class_prop (list) – list of list containing the desired class distributions
- Returns
indices to satisfy the probabilities given in imbal_class_prop
-
src.dataloader.set_prop_dataset(datasets, targets, balance)¶ Creates datasets with a given balance of classes
- Parameters
datasets – subsets of datasets
targets – labels of originial e.g. not split dataset
balance – list of lists containing the wanted probabilities of each class in the given datasets
- Returns
modified datasets
-
src.dataloader.split_dataset(dataset, percentage_of_dataset)¶ Separate dataset into parts with percentages specified in percentage_of_dataset
- Parameters
dataset – torch Dataset to be split
percentage_of_dataset (list) – list or numpy array with percentage of each split
- Returns
torch subsets