DIST -> DataLoader

General DataLoader

General class for creating Datasets. It works by detecting all specified file formats in the given directory and return the index list.

class tardis_em.dist_pytorch.datasets.dataloader.BasicDataset(coord_dir=None, coord_format='.csv', patch_if=500, downscale=None, rgb=False, benchmark=False, train=True)

BASIC CLASS FOR STANDARD DATASET CONSTRUCTION

Parameters:
  • coord_dir (str) – Dataset directory.

  • coord_format (tuple, str) – A tuple of allowed coord formats.

  • patch_if (int) – Max number of points per patch.

  • train (bool) – If True, compute as for training dataset, else test.

save_temp(i: int, **kwargs)

General class function to save temp data.

Parameters:
  • i (int) – Temp index value.

  • kwargs (np.ndarray) – Dictionary of all arrays to save.

load_temp(i: int, **kwargs) List[ndarray]

General class function to load temp data

Parameters:
  • i (int) – Temp index value.

  • kwargs (bool) – Dictionary of all arrays to load.

Returns:

List of kwargs arrays as a tensor arrays.

Return type:

List (list[np.ndarray])

static list_to_tensor(**kwargs) List[List[ndarray]]

Static class function to transform a list of numpy arrays to a list of tensor arrays.

Parameters:

kwargs (np.ndarray) – Dictionary of all files to transform into a tensor.

Returns:

List of kwargs arrays as a tensor array.

Return type:

List (list[torch.Tensor])

Specialized DataLoader’s

Filament structure DataSet

class tardis_em.dist_pytorch.datasets.dataloader.FilamentDataset(**kwargs)

FILAMENT-TYPE DATASET CONSTRUCTION

Returns:

coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).

df_idx: Normalize zero-out output for standardized dummy.

graph_idx: Numpy or Tensor list of 2D GT graphs.

output_idx: Numpy or Tensor list (N, 1) of output index value.

df_idx: Normalize zero-out output for standardized dummy.

Return type:

Tuple (list[np.ndarray])

PartNet synthetic DataSet

class tardis_em.dist_pytorch.datasets.dataloader.PartnetDataset(**kwargs)

PARTNET TYPE DATASET CONSTRUCTION

Returns:

coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).

df_idx: Normalize zero-out output for standardized dummy.

graph_idx: Numpy or Tensor list of 2D GT graphs.

output_idx: Numpy or Tensor list (N, 1) of output index value.

df_idx: Normalize zero-out output for standardized dummy.

Return type:

Tuple (list[np.ndarray])

ScanNet V2 synthetic DataSet

class tardis_em.dist_pytorch.datasets.dataloader.ScannetDataset(**kwargs)

SCANNET V2 TYPE DATASET CONSTRUCTION

Returns:

coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).

df_idx: Normalize zero-out output for standardized dummy.

graph_idx: Numpy or Tensor list of 2D GT graphs.

output_idx: Numpy or Tensor list (N, 1) of output index value.

df_idx: Normalize zero-out output for standardized dummy.

Return type:

Tuple (list[np.ndarray])

ScanNet V2 with RGB values synthetic DataSet

class tardis_em.dist_pytorch.datasets.dataloader.ScannetColorDataset(**kwargs)

SCANNET V2 + COLORS TYPE DATASET CONSTRUCTION

Returns:

coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).

rgb_idx: Numpy or Tensor list of RGB values (N, 3).

graph_idx: Numpy or Tensor list of 2D GT graphs.

output_idx: Numpy or Tensor list (N, 1) of output index value.

df_idx: Normalize zero-out output for standardized dummy.

Return type:

Tuple (list[np.ndarray])

Stanford S3DIS DataSet

class tardis_em.dist_pytorch.datasets.dataloader.Stanford3DDataset(**kwargs)

S3DIS TYPE DATASET CONSTRUCTION

Returns:

coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).

rgb_idx: Numpy or Tensor list of RGB values (N, 3).

graph_idx: Numpy or Tensor list of 2D GT graphs.

output_idx: Numpy or Tensor list (N, 1) of output index value.

df_idx: Normalize zero-out output for standardized dummy.

Return type:

Tuple (list[np.ndarray])

Helper Functions

Build dataloader

tardis_em.dist_pytorch.datasets.dataloader.build_dataset(dataset_type: str | list, dirs: list, max_points_per_patch: int, downscale=None, benchmark=False)

Wrapper for DataLoader

Function that wraps all data loaders and outputs only one asked for depending on a dataset

Parameters:
  • dataset_type (str) – Ask to recognize and process the dataset.

  • dirs (list) – Ask for a list with the directory given as [train, test].

  • max_points_per_patch (int) – Max number of points per patch.

  • downscale (None, float) – Overweight downscale factor

  • benchmark (bool) – If True construct data for benchmark

Returns:

Output DataLoader with the specified dataset for training and evaluation.

Return type:

Tuple[torch.DataLoader, torch.DataLoader]

Point Cloud Augmentation

tardis_em.dist_pytorch.datasets.augmentation.preprocess_data(coord: str | ndarray, image: str | None = None, size: int | None = None, include_label=True, normalization='simple') Tuple[ndarray, ndarray] | Tuple[ndarray, ndarray, ndarray]

Data augmentation function.

Given any supported coordinate file, the function process it with optional image data. If image data is used, the image output is a list of flattened image patches of a specified size. Additionally, the graph output can be created.

Parameters:
  • coord (str, ndarray) – Directory for the file containing coordinate data.

  • image (str, None) – Directory to the supported image file.

  • size (int, None) – Image patch size.

  • include_label (bool) – If True output coordinate array with label ids.

  • normalization (str) – Type of image normalization.

Returns:

Returns coordinates and optionally graph patch list.

Return type:

Tuple[np.ndarray, np.ndarray]

class tardis_em.dist_pytorch.datasets.augmentation.BuildGraph(K=2, mesh=False)

GRAPH REPRESENTATION FROM 2D/3D COORDINATES

The main class is to build a graph representation of any given point cloud based on the labeling information and optionally point distances.

The BuildGraph class outputs a 2D array of a graph representation built for the filament-like structure which allows for only a maximum of 2 connections per node. Or for an object structure where the cap interaction per node limit was fixed as 4. The graph representation for a mesh-like object is computed by identifying all points in the class and searching for 4 KNN for each node inside the class.

Parameters:

K (int) – Number of maximum connections per node.

class tardis_em.dist_pytorch.datasets.augmentation.Crop2D3D(image: ndarray, size: tuple, normalization)

2D/3D IMAGE CROPPING

Center crop gave the image to the specified size.

Parameters:
  • image (np.ndarray) – Image array.

  • size (tuple) – Uniform cropping size.

  • normalization (class) – Normalization type.

static get_xyz_position(center_point: int, size: int, max_size: int) Tuple[int, int]

Given the center point, calculate the range to crop.

Parameters:
  • center_point (int) – XYZ coordinate for center point.

  • size (int) – Crop size.

  • max_size (int) – Axis maximum size is used to calculate the offset.

Returns:

Min and max int values refer to the position on the axis.

Return type:

Tuple[int, int]

tardis_em.dist_pytorch.datasets.augmentation.upsample_pc(org_coord: ndarray, sampled_coord: ndarray)

upsample_pc _summary_

Parameters:
  • org_coord (np.ndarray) – _description_

  • sampled_coord (np.ndarray) – _description_

Patch dataset

class tardis_em.dist_pytorch.datasets.patches.PatchDataSet(max_number_of_points=500, overlap=0.15, drop_rate=None, graph=True, tensor=True)

BUILD PATCHED DATASET

Main change in v0.1.0RC3
  • Build moved to 3D patches

Class for computing optimal patch size for a maximum number of points per patch. The optimal size of the patch is determined by max number of points. It works by first calculating boundary box which is used to build 3D voxels. Voxel size is initiate and reduced to optimize voxel sizes fo point cloud can be cut fo patches with specified ‘max_number_of_points’. In the end, patches with a smaller number of points are marge with their neighbor in a way that will respect ‘max_number_of_points’ policy.

Output is given as a list of arrays as torch.Tensor or np.ndarray.

Parameters:
  • max_number_of_points (int) – Maximum allowed a number of points per patch.

  • overlap (float) – Percentage of overlapping voxel size

  • drop_rate (float) – Optimizer step size for reducing the size of patches.

  • graph (bool) – If True output computed graph for each patch of point cloud.

  • tensor (bool) – If True output all datasets as torch.Tensor.

boundary_box(coord, offset=None) ndarray

Utile class function to compute boundary box in 2D or 3D

Returns:

Boundary box dimensions

Return type:

np.ndarray

static center_patch(bbox, voxel_size=1) ndarray

Creates a regular grid within a bounding box.

Parameters:
  • bbox – list or tuple of 6 floats representing the bounding box as (xmin, ymin, zmin, xmax, ymax, zmax)

  • voxel_size – float representing the size of each voxel

Returns:

Of shape (N, 3) representing the center coordinates of each voxel

Return type:

np.ndarray

points_in_patch(coord: ndarray, patch_center: ndarray) bool

Utile class function for filtering point cloud and output only point in patch.

Parameters:
  • coord (np.ndarray) – 3D coordinate array.

  • patch_center (np.ndarray) – Array (1, 3) for the given patch center.

Returns:

Array of all points that are enclosed in the given patch.

Return type:

tuple(bool)

optimal_patches(coord: ndarray, random=False) List[bool]

Main class function to compute optimal patch size.

The function takes init stored variable and iteratively searches for voxel size small enough that allow for all patches to have an equal or less max number of points.

Parameters:
  • coord (np.ndarray) – List of coordinates for voxelize

  • random

static normalize_idx(coord_with_idx: ndarray) ndarray

Utile class function to replace ids with ordered output ID values for each point in patches. In other words, it produces a standardized ID for each point, so it can be identified with the source.

Parameters:

coord_with_idx (np.ndarray) – Coordinate id value i.

Returns:

An array all points in a patch with corrected ID value.

Return type:

np.ndarray

output_format(data: ndarray) ndarray | Tensor

Utile class function to output an array in the correct format (numpy or tensor)

Parameters:

data (np.ndarray) – Input data for format change.

Returns:

Array in file format specified by self.torch_output.

Return type:

np.ndarray

patched_dataset(coord: ndarray, label_cls=None, rgb=None, mesh=6, random=False, voxel_size=None) Tuple[List, List, List, List, List] | Tuple[List, List, List, List]