Dataloader

src.dataloader.create_dataloaders(batchsize=1024, img_size=128, path=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/distributed-malaria-detection/checkouts/latest/src/../data/Classification'), random_background=False, num_workers=0, percentage_of_dataset=numpy.array, balance=numpy.array)

Convenience function to setup dataloaders for experiments

Parameters
  • batchsize (int) – size of batch

  • img_size (int) – size of image - underlying assumption of square images

  • path (str) – path to folders containing images

  • random_background (bool) – whether to randomize uniform backgrounds or not

  • num_workers (int) – number of processes used to move data from RAM to GPU memory

  • percentage_of_dataset (tuple) – percentages of each split

  • balance (list) – list of lists containing the wanted probabilities of each class in the given datasets

Returns

torch.utils.dataloader objects

src.dataloader.create_dataset(path, data_augmentation)

Convenience function for this project

Parameters
  • path (str) – str path to root directory containing folders with samples for each class

  • data_augmentation – torch transforms object

Returns

torch dataset

src.dataloader.get_data_augmentation(random_background, img_size, dark_background=True)

Standard composed transformations for data augmentation

Parameters
  • random_background (bool) – Randomize background

  • img_size (int) – Scales input images to square img_size

  • dark_background (bool) – True - background has val 0 otherwise val 255

Returns

data_augmentation

src.dataloader.get_labels_and_class_counts(labels_list)

Calculates the counts of all unique classes.

Parameters

labels_list (list|ndarray) – list or ndarray with labels

Returns

src.dataloader.randomize_background(x, dark_background=True)

Since images in the malaria datasets have default backgrounds around the cells and LIME indicates that NNs use the backgrounds shape to make predictions

Parameters
  • x (torch.tensor) – input image

  • dark_background (bool) – Bool value of background True - 0,False - 255

Returns

Image with ranomized input

src.dataloader.resample(target_list, imbal_class_prop)

Function adapted from ptrblck’s PyTorch fork https://github.com/ptrblck/tutorials/blob/imbalanced_tutorial/intermediate_source/imbalanced_data_tutorial.py#L297 Resample the indices to create an artificially imbalanced dataset.

Parameters
  • target_list (list) – labels

  • imbal_class_prop (list) – list of list containing the desired class distributions

Returns

indices to satisfy the probabilities given in imbal_class_prop

src.dataloader.set_prop_dataset(datasets, targets, balance)

Creates datasets with a given balance of classes

Parameters
  • datasets – subsets of datasets

  • targets – labels of originial e.g. not split dataset

  • balance – list of lists containing the wanted probabilities of each class in the given datasets

Returns

modified datasets

src.dataloader.split_dataset(dataset, percentage_of_dataset)

Separate dataset into parts with percentages specified in percentage_of_dataset

Parameters
  • dataset – torch Dataset to be split

  • percentage_of_dataset (list) – list or numpy array with percentage of each split

Returns

torch subsets