DIST -> Utils

Build point cloud from image

class tardis_em.dist_pytorch.utils.build_point_cloud.BuildPointCloud

A utility class for handling and processing images to build point cloud data, typically from semantic masks. This class includes functions to validate and adjust input image data, as well as convert binary semantic masks into point cloud data. The generated point cloud data can be represented in 2D or 3D space.

static check_data(image: str | ndarray) ndarray | None

Checks and processes a binary input image to ensure it meets the required conditions for a semantic mask. The function supports input as a file path to a .tiff image or a numpy array, and it returns the processed image if all conditions are satisfied.

The function performs the following checks and adjustments: - Reads the image from a file path if a string is provided as input. - Validates that the image has either 2 or 3 dimensions. - Ensures the image is binary (contains only 0 and 1 values). - Converts the binary image values appropriately to ensure consistency. - Normalizes the image if necessary.

Parameters:

image (Union[str, np.ndarray]) – Input image that can be either a string representing a file path to a .tiff image or a numpy array. The input file or array is checked and formatted as required.

Returns:

Processed numpy array representing the validated binary image, or None if the image fails the binary or formatting checks.

Return type:

Union[np.ndarray, None]

Raises:
  • RuntimeWarning – If the input directory or .tiff image file is incorrect.

  • TardisError – If the input image fails any validation checks, such as having incorrect dimensions or not being binary.

build_point_cloud(image: str | ndarray, down_sampling: float | None = None, as_2d=False) Tuple[ndarray, ndarray] | ndarray

Generates a point cloud from an input image through skeletonization, with optional down-sampling and control of dimensionality reduction to 2D.

The function processes the input image by skeletonizing it to extract the pixel-wise skeleton. Depending on the image dimensions or user-provided parameters, a 2D or 3D spatial representation of the skeleton is then constructed. The point cloud data is returned in a structured format after optional down-sampling.

Parameters:
  • image (Union[str, np.ndarray]) – The input data to generate the point cloud from. It can be a string representing the file path to an image or a numpy array of the input data.

  • down_sampling (Union[float, None]) – Optional down-sampling factor to reduce the point cloud density. If provided, it defines the voxel size for filtering points in 3D space.

  • as_2d (bool) – Boolean flag for forcing the representation of the point cloud in two dimensions irrespective of the dimensionality of the input data.

Returns:

Returns a tuple containing (1) the generated point cloud as a numpy array with X, Y (and optionally Z) coordinates, and (2) the reduced point cloud if down-sampling is applied. If no down-sampling is applied, only the point cloud is returned without being wrapped in a tuple.

Return type:

Union[Tuple[ndarray, ndarray], np.ndarray]

tardis_em.dist_pytorch.utils.build_point_cloud.draw_line(p0: ndarray, p1: ndarray, line_id: int) ndarray

Generates a sequence of 3D coordinates between two specified points using an incremental line drawing algorithm. The method determines the dominant axis of movement and computes intermediate points accordingly while maintaining resolution and alignment in 3D space.

This is a computationally efficient implementation of a generalized Bresenham’s algorithm adapted for 3D space. Each generated coordinate is stored along with its corresponding line ID.

Parameters:
  • p0 (np.ndarray) – Starting point in the form of a NumPy array with [z, y, x] coordinates.

  • p1 (np.ndarray) – Ending point in the form of a NumPy array with [z, y, x] coordinates.

  • line_id (int) – Identifier associated with the generated line.

Returns:

The generated line as a NumPy 2D array where each row consists of [line_id, x, y, z].

Return type:

np.ndarray

tardis_em.dist_pytorch.utils.build_point_cloud.quadratic_bezier(p0: ndarray, p1: ndarray, p2: ndarray, t: float) list

Calculate the position on a quadratic Bézier curve at a given parameter t.

The function computes the coordinates of a point on a quadratic Bézier curve using the given control points: p0, p1, and p2, and the parameter t. Quadratic Bézier curves are a common representation of smooth curves in computer graphics and geometry. The computed point is returned as a list of rounded integer coordinates.

Parameters:
  • p0 – The first control point of the Bézier curve as a NumPy array.

  • p1 – The second control point of the Bézier curve as a NumPy array.

  • p2 – The third control point of the Bézier curve as a NumPy array.

  • t – The parameter value along the curve where the computation is to be performed. The value of t must lie within the interval [0, 1].

Returns:

A list of integers representing the coordinates of the computed point on the Bézier curve.

tardis_em.dist_pytorch.utils.build_point_cloud.draw_curved_line(p0: ndarray, p1: ndarray, p2: ndarray, line_id: int) ndarray

Draws a quadratic Bézier curve using three control points and returns the resultant points concatenated with a line identifier. The method calculates an approximate number of points required to represent the curve based on distances between the given control points and generates evenly distributed points along the curve.

Parameters:
  • p0 – The first control point as a NumPy array.

  • p1 – The second control point as a NumPy array.

  • p2 – The third control point as a NumPy array.

  • line_id – The identifier to prepend to each point in the resultant curve.

Returns:

A NumPy array containing the computed points of the curve. Each row represents a point with the structure [line_id, x, y], where x and y are the coordinates of the curve point.

tardis_em.dist_pytorch.utils.build_point_cloud.draw_circle(center: ndarray, radius: float, circle_id: int, _3d=False, size=None) ndarray

Generates the points of a circle in 2D or a sphere in 3D by calculating the coordinates of points lying on their respective perimeters based on the given center and radius.

This function supports both 2D and 3D spaces. If the parameter _3d is set to True, it calculates the points of a 3D sphere within the specified bounds (size). Otherwise, it computes the 2D circle points with XY and optional Z-plane configurations.

Parameters:
  • center – The center of the circle or sphere. Should be a numpy array with either 2 or 3 components (x, y[, z]).

  • radius – The radius of the circle or sphere.

  • circle_id – An identifier assigned to each point in the circle/sphere. This value will be included as the first element in each resulting coordinate.

  • _3d – A boolean to determine whether to compute a 3D sphere (True) or a 2D circle (False). Defaults to False.

  • size – Optional size constraints for the 3D sphere (e.g., bounding box dimensions). This parameter is only used when _3d is True.

Returns:

An array of point coordinates, each including the circle_id. For 2D, the coordinates are in (circle_id, x, y, z) format. For 3D, they are in (circle_id, z, y, x) format.

Return type:

numpy.ndarray

tardis_em.dist_pytorch.utils.build_point_cloud.draw_sphere(center, radius, sheet_id)

Generates a set of points on the surface of a sphere with a specified radius and center and associates them with a given sheet ID. The points are approximately evenly distributed across the surface of the sphere.

Parameters:
  • center – The 3D coordinates of the center of the sphere. Should be an array-like structure of three numeric values representing (x, y, z).

  • radius – The radius of the sphere. A non-negative float value.

  • sheet_id – An identifier for the generated points, typically a numeric or string value. Each point will be associated with this ID.

Returns:

A 2D NumPy array where each row represents a point. The first column contains the sheet ID, and the subsequent columns contain the 3D coordinates (x, y, z) of the point.

tardis_em.dist_pytorch.utils.build_point_cloud.draw_sheet(center: ndarray, size: tuple, sheet_id: int) ndarray

Generates a 3D representation of a sheet with specified size, position, and a unique identifier. The generated sheet includes transformations and rotations to simulate a more varied geometric structure, and filters out points outside the specified bounds.

Parameters:
  • center – A 3D ndarray representing the center point of the sheet.

  • size – A tuple defining the 3D dimensions (depth, height, width) of the sheet.

  • sheet_id – An integer unique identifier for the sheet being generated.

Returns:

A ndarray containing the generated points of the sheet. Each row includes the sheet ID and the coordinates (x, y, z) of a point. If no points lie within the bounds, an empty ndarray is returned.

tardis_em.dist_pytorch.utils.build_point_cloud.create_simulated_dataset(size, sim_type: str)

Generates a simulated dataset based on parameters specifying the type of simulation and the size of the dataset. The function creates different geometric shapes, such as lines, curves, circles, and spheres, depending on the specified simulation type.

Parameters:
  • size – The dimensions of the dataset as a tuple of integers, representing its shape (e.g., (height, width, depth)).

  • sim_type – A string specifying the type of simulation to create. Must be one of the following valid options: “mix3d”, “mix2d”, “filaments”, “membranes”, “membranes2d”.

Returns:

The generated dataset. The output is a NumPy array where each row represents a coordinate or shape in the simulated space.

Segment point cloud from graph

class tardis_em.dist_pytorch.utils.segment_point_cloud.PropGreedyGraphCut(threshold=0.5, connection=2, smooth=False)

Manages the process of stitching graph patches and their associated attributes, performing adjacency operations to create a cohesive graph network, and preprocessing connections to ensure consistency and limit redundant associations.

This class is designed for scenarios where graph predictions and corresponding node attributes (e.g., coordinates and class probabilities) are generated in patches and need to be combined into a unified representation. The provided methods allow for detailed control over graph stitching, class and coordinate merging, adjacency list construction, and preconditioning of connection data.

preprocess_connections(adj_matrix)

Preprocess the adjacency matrix to ensure mutual connections and limit the number of top connections for each node based on their probabilities.

This method processes the given adjacency matrix by first identifying potential top connections for each node, ensuring mutual connections exist between nodes, and finally limiting the connections to the top N based on their probability values. The updated adjacency matrix is returned with these modifications.

Parameters:

adj_matrix – A list of rows, where each row contains information about a node. Each row is represented as a tuple of four elements: - (node index, other metadata, list of connections, list of probabilities). Connections are represented as indices of connected nodes, and probabilities indicate the strength of these connections.

Returns:

Modified adjacency matrix with mutual connections and top connections limited to a specific number.

patch_to_segment(graph: list, coord: ndarray | list, idx: list, prune: int, sort=True, visualize: str | None = None) ndarray

Converts a patch of graph data into segments based on geometric and adjacency relationships within the input data. This function processes spatial graph representations and outputs segmented components based on provided criteria like pruning and optional visualization.

Parameters:
  • graph (list) – Input graph representation(s), which can be a numpy array, a Torch tensor, or a list of these types. Represents the connectivity information of the graph data.

  • coord (Union[np.ndarray, list]) – Coordinate data of the graph nodes in either numpy array or list format. Contains spatial locations of nodes in the graph.

  • idx (list) – Index information about the selected components of the graph (e.g., subclusters of a larger graph) to be processed.

  • prune (int) – Minimum threshold for the number of points required in a segment to be considered valid. Segments smaller than this threshold will be ignored.

  • sort (bool) – A flag that determines whether the output points in each segment should be sorted geometrically. Defaults to True.

  • visualize (Optional[str]) – An optional flag indicating whether the resulting segmented data should be visualized and in what mode (e.g., point cloud view or filament view). Accepts values like “f” or “p”.

Returns:

Segmented graph components, represented as a numpy array, where each row corresponds to a node and includes attributes like segment ID and spatial coordinates.

Return type:

np.ndarray

General Utils

tardis_em.dist_pytorch.utils.utils.pc_median_dist(pc: ndarray, avg_over=False, box_size=0.15) float

Computes the median nearest neighbor distance for a given point cloud.

This function calculates the median nearest neighbor distance between points in a given 2D or 3D point cloud array. Optionally, it can restrict the computation to a subset of points that are within a bounding box region centered around the median position of the point cloud. The bounding box dimensions can be scaled based on a user-defined box_size.

Parameters:
  • pc (np.ndarray) – A 2D or 3D point cloud array of shape (N, D), where N is the number of points and D is the spatial dimensionality (2 or 3).

  • avg_over (bool, optional) – Flag to indicate whether to compute the distances over a subset of the point cloud within a bounding box. Defaults to False.

  • box_size (float, optional) – Fraction of the bounding box size relative to the point cloud extents. Only applicable when avg_over is True. Defaults to 0.15.

Returns:

The mean of the median nearest neighbor distances.

Return type:

float

tardis_em.dist_pytorch.utils.utils.point_in_bb(points: ndarray, min_x: int, max_x: int, min_y: int, max_y: int, min_z: float32 | None = None, max_z: float32 | None = None) ndarray

Determines whether points in a given array fall within a specified bounding box.

The function evaluates if points, provided as an array, lie within the boundaries defined by minimum and maximum values for x, y, and optionally z coordinates. It enables the filtering of points based on inclusion within a 2D or 3D bounding box.

Parameters:
  • points (numpy.ndarray) – Array representing the coordinates of points, where each row is a point and columns correspond to x, y, and optionally z coordinates.

  • min_x (int) – Minimum x-coordinate boundary of the bounding box.

  • max_x (int) – Maximum x-coordinate boundary of the bounding box.

  • min_y (int) – Minimum y-coordinate boundary of the bounding box.

  • max_y (int) – Maximum y-coordinate boundary of the bounding box.

  • min_z (numpy.float32, optional) – Optional, minimum z-coordinate boundary of the bounding box.

  • max_z (numpy.float32, optional) – Optional, maximum z-coordinate boundary of the bounding box.

Returns:

A boolean array where each element corresponds to the inclusion of a point in the bounding box.

Return type:

numpy.ndarray

class tardis_em.dist_pytorch.utils.utils.DownSampling(voxel=None, threshold=None, labels=True, KNN=False)

Provides functionality for downsampling point cloud data, allowing optional inclusion of RGB values and support for various data formats.

The class is designed to aid in voxel downsampling operations on point cloud data. It supports two primary methods of operation: processing the entire point cloud at once or downsampling each index when the input is a list. Additional features include support for sampling point clouds with or without assigned class IDs (labels) and optional K-Nearest Neighbors (KNN) consideration.

static pc_down_sample(coord: ndarray, sampling, rgb=None) Tuple[ndarray, ndarray] | ndarray

Perform down-sampling for the given coordinates and optionally their corresponding RGB values. If RGB values are provided, the method returns both the downsampled coordinates and their RGB values. Otherwise, only the downsampled coordinates are returned.

Parameters:
  • coord (np.ndarray) – Coordinates to down-sample.

  • sampling – Sampling parameters or logic to apply.

  • rgb (np.ndarray, optional) – Optional RGB values corresponding to the input coordinates.

Returns:

Down-sampled coordinates, and optionally RGB values if provided.

Return type:

Union[Tuple[np.ndarray, np.ndarray], np.ndarray]

class tardis_em.dist_pytorch.utils.utils.VoxelDownSampling(**kwargs)

Provides functionality for down-sampling 3D point clouds by grouping points into voxels of a specified size and computing the centroid of each voxel. Optionally supports operations with RGB data, labels, and nearest neighbor search using KDTree.

Useful for reducing the size of large point clouds for computational efficiency while preserving their spatial structure.

pc_down_sample(coord: ndarray, sampling: float, rgb: ndarray | None = None) Tuple[ndarray, ndarray] | ndarray

Down-sample a point cloud using a voxel-based approach.

This function performs down-sampling of a point cloud by dividing the point cloud space into uniformly spaced 3D grid cells (voxels) and representing each voxel by its centroid. Optionally, it associates additional information such as color or label attributes to the down-sampled points.

Parameters:
  • coord – The 3D coordinates of the input point cloud. It is expected to be a NumPy array with shape (N, 3) where N is the number of points.

  • sampling – The size of the voxel grid’s cube edge. A smaller value results in a finer resolution, retaining more detail.

  • rgb – Optional. If provided, it should be a NumPy array with shape (N, 3) representing the RGB colors of the input point cloud. The colors of down-sampled points will be computed accordingly.

Returns:

  • If rgb is not None, returns a tuple:
    • First element is a NumPy array with shape (M, 3) or (M, 4), where M is the number of down-sampled points. Contains the 3D coordinates of the down-sampled points. If self.labels is true, an additional label column is included.

    • Second element is a NumPy array with shape (M, 3), representing the RGB colors associated with the down-sampled points.

  • If rgb is None, returns only the down-sampled 3D coordinates as a NumPy array with shape (M, 3) or (M, 4) depending on self.labels.

class tardis_em.dist_pytorch.utils.utils.RandomDownSampling(**kwargs)

A subclass of DownSampling that implements random down-sampling of a point cloud.

This class leverages random selection to down-sample point cloud data. It is suitable for reducing the size of datasets while maintaining a random subset of points. The class retains compatibility with additional node features such as RGB values during the down-sampling process.

static pc_down_sample(coord: ndarray, sampling, rgb: ndarray | None = None) Tuple[ndarray, ndarray] | ndarray

Downsamples a point cloud represented by coordinates and optional RGB values based on a provided sampling strategy. The function either retains a fraction of the points or selects points based on a callable sampling strategy.

Parameters:
  • coord (np.ndarray) – The input point cloud coordinates. Each row represents a point. The shape of the numpy array is (N, D), where N is the number of points and D is the dimensionality of each point.

  • sampling (Union[int, float, Callable]) – The sampling strategy. Can be an integer, float, or callable. If an integer or float, specifies the fraction of points to keep. If a callable, it determines which points to keep dynamically.

  • rgb (Optional[np.ndarray]) – Optional array representing the RGB colors for each point in the point cloud. It has the same number of rows as coord, where each row corresponds to a point’s color.

Returns:

A subset of the point cloud coordinates. If rgb is provided, the function also returns a subset of the RGB values corresponding to the retained points.

Return type:

Union[Tuple[np.ndarray, np.ndarray], np.ndarray]

tardis_em.dist_pytorch.utils.utils.check_model_dict(model_dict: dict) dict

Processes the provided model_dict by mapping specific key patterns to predefined normalized keys and extracting their corresponding values into a new dictionary. This function simplifies the original dictionary representation into a standardized configuration dictionary that is ready for further use. Default values are also assigned to certain keys if they are not present in the provided dictionary.

Parameters:

model_dict (dict) – A dictionary containing model configuration parameters with keys following specific naming conventions.

Returns:

A standardized dictionary that includes key-value pairs extracted based on the specified naming conventions, along with default values for missing keys such as “num_cls”, “rgb_embed_sigma”, and “num_knn”.

Return type:

dict