Global -> SetUp Handlers

Dataset organizers

tardis_em.utils.dataset.find_filtered_files(directory, prefix='instances_filter', format_b='csv')

Finds and retrieves a list of files in the specified directory based on a given prefix and format(s). This function can search for files matching the prefix along with a single format or multiple formats, returning a list of filtered file paths.

Parameters:
  • directory (str) – The directory path in which the search will be performed.

  • prefix (str, optional) – The prefix that filtered files should match. Defaults to ‘instances_filter’.

  • format_b (str or list or tuple, optional) – The format or formats (as string or a list/tuple of strings) to search for matching files. Defaults to ‘csv’.

Returns:

A list of file paths that match the specified prefix and format(s) in the directory.

Return type:

list

tardis_em.utils.dataset.move_train_dataset(dir_s: str, coord_format: tuple, with_img: bool, img_format: tuple | None = None)

Moves and organizes a training dataset by placing coordinate files and optionally image files into appropriate subdirectories.

Parameters:
  • dir_s (str) – The directory containing dataset files to be moved.

  • coord_format (tuple) – The file extension or format of coordinate files to be processed.

  • with_img (bool) – A flag indicating whether to include associated image files during dataset organization.

  • img_format (tuple, optional) – The file extension or format of image files to be processed if with_img is True. Optional.

Returns:

None

Return type:

None

Raises:
  • TardisError – If no coordinate files matching coord_format are found in the given directory.

  • TardisError – If with_img is True but no image files matching img_format are found in the given directory.

tardis_em.utils.dataset.build_test_dataset(dataset_dir: str, dataset_no: int, stanford=False)

Builds a test dataset by reorganizing and moving files from the train dataset directory to a test dataset directory based on specific selection logic. This function handles dataset creation for both the general case and a special case for Stanford datasets. Images and corresponding masks are moved into a new test directory, ensuring the train directory is properly split into train and test datasets.

Parameters:
  • dataset_dir (str) – Path to the directory containing the dataset.

  • dataset_no (int) – Number of datasets or partitions to consider when splitting for test data.

  • stanford (bool) – If True, applies specific folder organization and file movement logic for Stanford datasets. Defaults to False.

Returns:

None

Environment setup

tardis_em.utils.setup_envir.build_new_dir(dir_s: str)

Builds a directory for temporary output files and ensures required files are present within the specified directory. If the required conditions are not met or the output directory already exists, appropriate actions are taken.

Parameters:

dir_s (str) – Path to the base directory where the temporary directory will be created.

Returns:

None

tardis_em.utils.setup_envir.build_temp_dir(dir_s: str)

Creates necessary directories in the specified base directory to ensure the required structure for processing is in place. The function first checks for the existence of a “temp” subdirectory and its nested structure within the provided dir_. If the “temp” and respective subdirectories already exist, they will be reset to a clean state. If not, proper directories will be created as needed. Additionally, verifies and creates a top-level “Predictions” directory if it doesn’t exist.

Parameters:

dir_s (str) – The base directory where the directories need to be created or reset.

Returns:

None

tardis_em.utils.setup_envir.clean_up(dir_s: str)

Removes a directory named “temp” and its subdirectories, including “Patches” and “Predictions”, from the specified directory path. Ensures recursive deletion of all files and subdirectories inside the “temp” directory.

Parameters:

dir_s – The path to the parent directory containing the “temp” directory to be cleaned up. The directory path must be provided as a string.

Returns:

None. The function performs cleanup operations without returning a value.

tardis_em.utils.setup_envir.check_dir(dir_s: str, train_img: str, train_mask: str, test_img: str, test_mask: str, with_img: bool, img_format: tuple | str, mask_format: tuple | str | None) bool

Validates the structure and contents of a dataset directory to ensure proper organization for training and testing processes. This function checks for the existence of specific subdirectories (train and test), as well as for the presence and consistency of files (both images and masks) in the dataset. It ensures that the number of image files matches the number of mask files, and optionally validates file formats.

Parameters:
  • dir_s (str) – Root directory of the dataset to validate.

  • train_img (str) – Path to the subdirectory containing training images.

  • train_mask (str) – Path to the subdirectory containing training masks.

  • test_img (str) – Path to the subdirectory containing test images.

  • test_mask (str) – Path to the subdirectory containing test masks.

  • with_img (bool) – Indicates whether the validation should check for the presence of both images and masks, or just masks alone.

  • img_format (Union[tuple, str]) – File extension(s) for image files. It can be either a tuple of extensions or a single extension as a string.

  • mask_format (Union[tuple, str, None]) – File extension(s) for mask files. It can be a tuple of extensions, a single extension as a string, or None to disable mask format validation.

Returns:

A boolean indicating whether the directory structure and contents meet the validation criteria.

Return type:

bool