DIST -> DataLoader
General DataLoader
General class for creating Datasets. It works by detecting all specified file formats in the given directory and return the index list.
- class tardis_em.dist_pytorch.datasets.dataloader.BasicDataset(coord_dir=None, coord_format='.csv', patch_if=500, downscale=None, rgb=False, benchmark=False, train=True)
BASIC CLASS FOR STANDARD DATASET CONSTRUCTION
- Parameters:
coord_dir (str) – Dataset directory.
coord_format (tuple, str) – A tuple of allowed coord formats.
patch_if (int) – Max number of points per patch.
train (bool) – If True, compute as for training dataset, else test.
- save_temp(i: int, **kwargs)
General class function to save temp data.
- Parameters:
i (int) – Temp index value.
kwargs (np.ndarray) – Dictionary of all arrays to save.
- load_temp(i: int, **kwargs) List[ndarray]
General class function to load temp data
- Parameters:
i (int) – Temp index value.
kwargs (bool) – Dictionary of all arrays to load.
- Returns:
List of kwargs arrays as a tensor arrays.
- Return type:
List (list[np.ndarray])
- static list_to_tensor(**kwargs) List[List[ndarray]]
Static class function to transform a list of numpy arrays to a list of tensor arrays.
- Parameters:
kwargs (np.ndarray) – Dictionary of all files to transform into a tensor.
- Returns:
List of kwargs arrays as a tensor array.
- Return type:
List (list[torch.Tensor])
Specialized DataLoader’s
Filament structure DataSet
- class tardis_em.dist_pytorch.datasets.dataloader.FilamentDataset(**kwargs)
FILAMENT-TYPE DATASET CONSTRUCTION
- Returns:
coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).
df_idx: Normalize zero-out output for standardized dummy.
graph_idx: Numpy or Tensor list of 2D GT graphs.
output_idx: Numpy or Tensor list (N, 1) of output index value.
df_idx: Normalize zero-out output for standardized dummy.
- Return type:
Tuple (list[np.ndarray])
PartNet synthetic DataSet
- class tardis_em.dist_pytorch.datasets.dataloader.PartnetDataset(**kwargs)
PARTNET TYPE DATASET CONSTRUCTION
- Returns:
coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).
df_idx: Normalize zero-out output for standardized dummy.
graph_idx: Numpy or Tensor list of 2D GT graphs.
output_idx: Numpy or Tensor list (N, 1) of output index value.
df_idx: Normalize zero-out output for standardized dummy.
- Return type:
Tuple (list[np.ndarray])
ScanNet V2 synthetic DataSet
- class tardis_em.dist_pytorch.datasets.dataloader.ScannetDataset(**kwargs)
SCANNET V2 TYPE DATASET CONSTRUCTION
- Returns:
coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).
df_idx: Normalize zero-out output for standardized dummy.
graph_idx: Numpy or Tensor list of 2D GT graphs.
output_idx: Numpy or Tensor list (N, 1) of output index value.
df_idx: Normalize zero-out output for standardized dummy.
- Return type:
Tuple (list[np.ndarray])
ScanNet V2 with RGB values synthetic DataSet
- class tardis_em.dist_pytorch.datasets.dataloader.ScannetColorDataset(**kwargs)
SCANNET V2 + COLORS TYPE DATASET CONSTRUCTION
- Returns:
coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).
rgb_idx: Numpy or Tensor list of RGB values (N, 3).
graph_idx: Numpy or Tensor list of 2D GT graphs.
output_idx: Numpy or Tensor list (N, 1) of output index value.
df_idx: Normalize zero-out output for standardized dummy.
- Return type:
Tuple (list[np.ndarray])
Stanford S3DIS DataSet
- class tardis_em.dist_pytorch.datasets.dataloader.Stanford3DDataset(**kwargs)
S3DIS TYPE DATASET CONSTRUCTION
- Returns:
coords_idx: Numpy or Tensor list of coordinates (N, (2, 3)).
rgb_idx: Numpy or Tensor list of RGB values (N, 3).
graph_idx: Numpy or Tensor list of 2D GT graphs.
output_idx: Numpy or Tensor list (N, 1) of output index value.
df_idx: Normalize zero-out output for standardized dummy.
- Return type:
Tuple (list[np.ndarray])
Helper Functions
Build dataloader
- tardis_em.dist_pytorch.datasets.dataloader.build_dataset(dataset_type: str | list, dirs: list, max_points_per_patch: int, downscale=None, benchmark=False)
Wrapper for DataLoader
Function that wraps all data loaders and outputs only one asked for depending on a dataset
- Parameters:
dataset_type (str) – Ask to recognize and process the dataset.
dirs (list) – Ask for a list with the directory given as [train, test].
max_points_per_patch (int) – Max number of points per patch.
downscale (None, float) – Overweight downscale factor
benchmark (bool) – If True construct data for benchmark
- Returns:
Output DataLoader with the specified dataset for training and evaluation.
- Return type:
Tuple[torch.DataLoader, torch.DataLoader]
Point Cloud Augmentation
- tardis_em.dist_pytorch.datasets.augmentation.preprocess_data(coord: str | ndarray, image: str | None = None, size: int | None = None, include_label=True, normalization='simple') Tuple[ndarray, ndarray] | Tuple[ndarray, ndarray, ndarray]
Data augmentation function.
Given any supported coordinate file, the function process it with optional image data. If image data is used, the image output is a list of flattened image patches of a specified size. Additionally, the graph output can be created.
- Parameters:
coord (str, ndarray) – Directory for the file containing coordinate data.
image (str, None) – Directory to the supported image file.
size (int, None) – Image patch size.
include_label (bool) – If True output coordinate array with label ids.
normalization (str) – Type of image normalization.
- Returns:
Returns coordinates and optionally graph patch list.
- Return type:
Tuple[np.ndarray, np.ndarray]
- class tardis_em.dist_pytorch.datasets.augmentation.BuildGraph(K=2, mesh=False)
GRAPH REPRESENTATION FROM 2D/3D COORDINATES
The main class is to build a graph representation of any given point cloud based on the labeling information and optionally point distances.
The BuildGraph class outputs a 2D array of a graph representation built for the filament-like structure which allows for only a maximum of 2 connections per node. Or for an object structure where the cap interaction per node limit was fixed as 4. The graph representation for a mesh-like object is computed by identifying all points in the class and searching for 4 KNN for each node inside the class.
- Parameters:
K (int) – Number of maximum connections per node.
- class tardis_em.dist_pytorch.datasets.augmentation.Crop2D3D(image: ndarray, size: tuple, normalization)
2D/3D IMAGE CROPPING
Center crop gave the image to the specified size.
- Parameters:
image (np.ndarray) – Image array.
size (tuple) – Uniform cropping size.
normalization (class) – Normalization type.
- static get_xyz_position(center_point: int, size: int, max_size: int) Tuple[int, int]
Given the center point, calculate the range to crop.
- Parameters:
center_point (int) – XYZ coordinate for center point.
size (int) – Crop size.
max_size (int) – Axis maximum size is used to calculate the offset.
- Returns:
Min and max int values refer to the position on the axis.
- Return type:
Tuple[int, int]
- tardis_em.dist_pytorch.datasets.augmentation.upsample_pc(org_coord: ndarray, sampled_coord: ndarray)
upsample_pc _summary_
- Parameters:
org_coord (np.ndarray) – _description_
sampled_coord (np.ndarray) – _description_
Patch dataset
- class tardis_em.dist_pytorch.datasets.patches.PatchDataSet(max_number_of_points=500, overlap=0.15, drop_rate=None, graph=True, tensor=True)
BUILD PATCHED DATASET
- Main change in v0.1.0RC3
Build moved to 3D patches
Class for computing optimal patch size for a maximum number of points per patch. The optimal size of the patch is determined by max number of points. It works by first calculating boundary box which is used to build 3D voxels. Voxel size is initiate and reduced to optimize voxel sizes fo point cloud can be cut fo patches with specified ‘max_number_of_points’. In the end, patches with a smaller number of points are marge with their neighbor in a way that will respect ‘max_number_of_points’ policy.
Output is given as a list of arrays as torch.Tensor or np.ndarray.
- Parameters:
max_number_of_points (int) – Maximum allowed a number of points per patch.
overlap (float) – Percentage of overlapping voxel size
drop_rate (float) – Optimizer step size for reducing the size of patches.
graph (bool) – If True output computed graph for each patch of point cloud.
tensor (bool) – If True output all datasets as torch.Tensor.
- boundary_box(coord, offset=None) ndarray
Utile class function to compute boundary box in 2D or 3D
- Returns:
Boundary box dimensions
- Return type:
np.ndarray
- static center_patch(bbox, voxel_size=1) ndarray
Creates a regular grid within a bounding box.
- Parameters:
bbox – list or tuple of 6 floats representing the bounding box as (xmin, ymin, zmin, xmax, ymax, zmax)
voxel_size – float representing the size of each voxel
- Returns:
Of shape (N, 3) representing the center coordinates of each voxel
- Return type:
np.ndarray
- points_in_patch(coord: ndarray, patch_center: ndarray) bool
Utile class function for filtering point cloud and output only point in patch.
- Parameters:
coord (np.ndarray) – 3D coordinate array.
patch_center (np.ndarray) – Array (1, 3) for the given patch center.
- Returns:
Array of all points that are enclosed in the given patch.
- Return type:
tuple(bool)
- optimal_patches(coord: ndarray, random=False) List[bool]
Main class function to compute optimal patch size.
The function takes init stored variable and iteratively searches for voxel size small enough that allow for all patches to have an equal or less max number of points.
- Parameters:
coord (np.ndarray) – List of coordinates for voxelize
random
- static normalize_idx(coord_with_idx: ndarray) ndarray
Utile class function to replace ids with ordered output ID values for each point in patches. In other words, it produces a standardized ID for each point, so it can be identified with the source.
- Parameters:
coord_with_idx (np.ndarray) – Coordinate id value i.
- Returns:
An array all points in a patch with corrected ID value.
- Return type:
np.ndarray
- output_format(data: ndarray) ndarray | Tensor
Utile class function to output an array in the correct format (numpy or tensor)
- Parameters:
data (np.ndarray) – Input data for format change.
- Returns:
Array in file format specified by self.torch_output.
- Return type:
np.ndarray
- patched_dataset(coord: ndarray, label_cls=None, rgb=None, mesh=6, random=False, voxel_size=None) Tuple[List, List, List, List, List] | Tuple[List, List, List, List]