CNN -> Datasets

Image Augmentation

class tardis_em.cnn.datasets.augmentation.CenterCrop(size: tuple)

CenterCrop is used to crop the center region of 2D or 3D image arrays.

The purpose of this class is to allow cropping of image arrays to a predefined central size for further processing, training, or analysis. The class supports both 2D and 3D image formats. The usage involves initializing the crop size and then calling the instance with the input image array and an optional mask array.

class tardis_em.cnn.datasets.augmentation.RandomFlip

RandomFlip class for flipping 2D or 3D images and their corresponding label masks along a random axis. This class is commonly used in data augmentation for machine learning models, particularly in image processing tasks.

class tardis_em.cnn.datasets.augmentation.RandomRotation

Randomize rotation.

This class provides a mechanism to apply random rotations to 2D/3D image arrays and their corresponding label mask arrays. It can determine the rotation angle and direction randomly, then apply the transformation to the input data. The main purpose is to augment the data by introducing random rotations.

class tardis_em.cnn.datasets.augmentation.ComposeRandomTransformation(transformations: list)

ComposeRandomTransformation applies a random sequence of transformations from a given list to the input data. The number of transformations applied in a sequence is determined randomly between 1 and 3. This class is designed to augment data by applying non-deterministic transformations to images and their corresponding label masks during preprocessing.

tardis_em.cnn.datasets.augmentation.preprocess(image: ~numpy.ndarray, transformation: bool, size: tuple | int | None = <class 'int'>, mask: ~numpy.ndarray | None = None, output_dim_mask=1) Tuple[ndarray, ndarray] | ndarray

Preprocesses an image with optional transformations, resizing, and mask handling. The function supports both 2D and 3D images. It resizes the image and optionally a mask to the desired size, applies random transformations, and prepares the data for model input by expanding its dimensions as needed.

Parameters:
  • image – The input image to preprocess. Must be a NumPy array of 2 or 3 dimensions.

  • transformation – A boolean flag to determine whether random transformations such as flipping and rotation should be applied to the image.

  • size – The target size for resizing the image. Can be an integer for uniform scaling or a tuple for specific dimensions.

  • mask – Optional. A mask corresponding to the input image. Must match the dimensions of the image.

  • output_dim_mask – Optional. The number of desired output dimensions for the mask. Default is 1.

Returns:

If a mask is provided, returns a tuple containing the preprocessed image and mask. If no mask is provided, returns only the preprocessed image.

Raises:

Exception – Raised when a tuple size for resizing does not represent uniform scaling (e.g., dimensions that are non-uniform or inconsistent).

DataLoader

class tardis_em.cnn.datasets.dataloader.CNNDataset(img_dir: str, mask_dir: str, size=64, mask_suffix='_mask', transform=True, out_channels=1)

Handles dataset creation and processing for Convolutional Neural Network (CNN) training and inference.

This class manages the loading, processing, and formatting of image and mask data needed for training CNN models. It includes normalization, size adjustments, and optional transformations to prepare data for further model usage.

class tardis_em.cnn.datasets.dataloader.PredictionDataset(img_dir: str, out_channels=1)

Manages loading, processing, and serving of datasets for image prediction tasks.

This class provides methods to load images from a specified directory, preprocess and format them into tensors compatible with convolutional neural networks. It also facilitates retrieval by a specific index. Used for predictive model input preparation.

Dataset Builder

tardis_em.cnn.datasets.build_dataset.build_train_dataset(dataset_dir: str, circle_size: int, resize_pixel_size: float | None, trim_xy: int, trim_z: int, benchmark=False, correct_pixel_size=None)

Build a training dataset by processing image files and their corresponding masks.

This function performs a series of data pre-processing steps, including file validation, loading images and masks, calculating scaling factors based on pixel size, handling different mask formats, and generating the required training dataset with any necessary transformations. Logs the progress and any encountered errors into a log file during processing.

Parameters:
  • dataset_dir (str) – Directory path containing the dataset files

  • circle_size (int) – Size of the circle used for mask drawing

  • resize_pixel_size (Union[float, NoneType]) – Target pixel size for resizing the images, if specified

  • trim_xy (int) – Number of pixels to trim from the x and y dimensions during processing

  • trim_z (int) – Number of pixels to trim from the z dimension during processing

  • benchmark (bool) – Flag indicating whether to keep certain processing details for benchmark datasets

  • correct_pixel_size (Union[float, NoneType]) – Optional, explicitly specifies the correct pixel size to be used

Returns:

None

Return type:

NoneType

tardis_em.cnn.datasets.build_dataset.load_img_mask_data(image: str, mask: str) Tuple[ndarray, ndarray, ndarray]

Load image and mask data from various supported file formats (e.g., MRC, REC, Amira, TIFF, CSV). The function also processes correlated data from mask coordinate files where necessary. It supports Amira, TIFF, and MRC/REC file formats for flexibility in scientific imaging workflows. Normalization and scaling are optionally applied to the image as part of the loading process. The function returns the processed image, mask or coordinate data, and pixel size information.

Parameters:
  • image (str) – Path to the image file. Supported formats are .mrc, .rec, .map, .am, .tif, and .tiff.

  • mask (str) – Path to the mask file. Supported formats are _mask.mrc, _mask.rec, _mask.am, _mask.csv, .CorrelationLines.am, and _mask.tif.

Returns:

A tuple containing the normalized image, loaded mask or coordinate data (if applicable), and the pixel size as either a float (for pixel size) or None.

Return type:

Tuple[numpy.ndarray, numpy.ndarray or numpy.ndarray, float or None]

tardis_em.cnn.datasets.build_dataset.error_log_build_data(dir_name: str, log_file: ndarray, id_i: int, i: str) ndarray

Stores error data into a log file and saves the updated log file.

This function updates the specified log file with error data corresponding to a given ID and an identifier string. After updating the log file, it saves the file in the specified directory.

Parameters:
  • dir_name (str) – The directory location where the updated log file should be saved.

  • log_file (np.ndarray) – A two-dimensional NumPy array representing the log file where error information will be stored.

  • id_i (int) – The integer identifier used to locate and store error data within the log file.

  • i (str) – The identifier string associated with the specific error or data entry being logged.

Returns:

The updated log file as a two-dimensional NumPy array after storing the new data.

Return type:

np.ndarray