Global -> Data Handlers

Import data formats

class tardis_em.utils.load_data.ImportDataFromAmira(src_am: str, src_img: str | None = None, src_surf: str | None = None)

Class for importing and handling data from Amira files.

Provides functionality to extract spatial graphs, segment data, points, and vertices from Amira files. This class also handles image and surface data if provided, performing validation for Amira-specific file formats. It is designed to assist in loading Amira Mesh 3D models and extracting relevant graphical and spatial data for further processing.

get_points() ndarray | None

Computes and returns the transformed points’ coordinates based on the spatial graph and image properties.

If the spatial_graph is None, it returns None. The method calculates the transformed coordinates of points, adjusting for spatial shifts and applying scaling based on the pixel size or other specified factors.

Raises:

IndexError – Raised if the extraction of ‘Coordinate’ information from the spatial_graph fails due to lack of expected content or format.

Return type:

Union[numpy.ndarray, None]

Returns:

A numpy array of transformed point coordinates or None if the spatial_graph attribute is unset.

get_vertex() ndarray | None

Computes and returns the coordinates of a vertex in the spatial graph with optional coordinate transformations applied. This function handles specific conditions related to the presence of spatial_graph and src_img attributes and applies transformations and scaling to the identified vertex coordinates.

Returns:

numpy.ndarray containing the transformed and scaled vertex coordinates, or None in case the spatial_graph attribute is None.

If the coordinate unit retrieved from the spatial graph is ‘nm’, then the vertex coordinates are scaled differently based on the pixel_size value.

get_segmented_points() ndarray | None

Generates segmented points based on the spatial graph and the computed segments. The segmentation assigns an index to each point in the graph, indicating which segment it belongs to.

Returns:

A numpy array of shape (N, 4) where each row represents a point with its corresponding segment index as the first element. The columns correspond to: [segment_index, x_coordinate, y_coordinate, z_coordinate]. Returns None if the spatial_graph is None.

Return type:

Union[numpy.ndarray, None]

get_labels() dict | None

Determines and returns a dictionary representing labels and their corresponding segment points from the spatial graph, if available. The method first identifies label lines and segment definitions within the spatial_graph. It calculates ranges for each label and extracts associated data points. If no spatial_graph exists, it will return None.

Raises:

ValueError – If a label identifier is not found in the spatial_graph.

Returns:

A dictionary mapping label names to points belonging to each segment or None if spatial_graph is not defined.

Return type:

dict or None

get_image()

Retrieves the image and its corresponding pixel size.

Returns:

A tuple where the first element is the image and the second element is the pixel size.

Return type:

tuple

get_pixel_size() float

Retrieves the size of a single pixel.

Returns:

The size of the pixel.

Return type:

float

get_surface() Tuple | None

Provides functionality to return the surface attribute of an object.

Returns:

Either the surface object or None if unavailable

Return type:

Union[Tuple, None]

tardis_em.utils.load_data.load_tiff(tiff: str) Tuple[ndarray, float]

Load a TIFF file and return its data as a NumPy array along with an intensity scale factor.

Parameters:

tiff (str) – The path to the TIFF file to be loaded.

Returns:

A tuple containing the NumPy array representation of the TIFF file and a float representing the intensity scale factor.

Return type:

Tuple[np.ndarray, float]

Raises:

TardisError – If the specified TIFF file does not exist.

tardis_em.utils.load_data.mrc_read_header(mrc: str | bytes | None = None)

Reads the header of an MRC file and returns it as a named tuple. This function supports input as a file path or raw bytes. If a string is provided, the function will open the file in binary mode and read the first 1024 bytes as the header. If raw bytes are passed directly, it processes them as the header.

Parameters:

mrc – The MRC file path as a string, raw header bytes, or None.

Returns:

Named tuple representing the parsed header of the MRC file.

tardis_em.utils.load_data.mrc_write_header(*args) bytes

Constructs an MRC file header.

Parameters:

args – The arguments required to initialize an MRCHeader object.

Returns:

Packed binary data representing the MRC header.

tardis_em.utils.load_data.mrc_mode(mode: int, amin: int)

Determines and returns the appropriate data type or mode based on the provided mode and amin values. The function maps image modes to their respective data types, handling specific cases for input mode and amin values and also validates if the mode corresponds to known types.

Parameters:
  • mode (int) – The mode to evaluate. It can be either an integer or a specific type matching one of the predefined dtype values.

  • amin (int) – Minimum amplitude or intensity value used to refine the data type determination for the provided mode. Applicable when the mode is 0.

Returns:

Returns the corresponding data type or mode index based on the provided parameters. For integer mode inputs, it will return the dtype associated with the mode or an error will be raised for unsupported modes. For non-integer type matching, it will return the corresponding mode index.

Return type:

Union[numpy.dtype, str, int]

tardis_em.utils.load_data.load_am(am_file: str)

Loads data from an AmiraMesh (.am) 3D image file and extracts image, pixel size, physical dimensions, and transformation details.

Parameters:

am_file (str) – File path of the .am file to be loaded.

Returns:

A tuple containing: - Numpy array of the image. - Pixel size in angstrom units. - Physical size of the image. - Transformation (offsets) in the bounding box.

Return type:

tuple[np.ndarray, float, float, np.ndarray]

Raises:

TardisError – If the file does not exist, is of unsupported format, or contains missing or invalid metadata.

tardis_em.utils.load_data.load_am_surf(surf_file: str, simplify_f=None) Tuple

Parses an Amira surface file and extracts material names, grid properties, vertex data, and triangle data. It also optionally simplifies the geometry using Open3D.

This function reads the content of an Amira surface file, retrieves Material names, GridBox, GridSize, Vertices, and Triangle data. It organizes these data into structured formats and optionally simplifies vertex and triangle meshes if the simplify_ argument is supplied. It is designed to assist in reading and processing surface geometry data from Amira format.

Parameters:
  • surf_file (str) – Path to the Amira surface file.

  • simplify_f (Optional[int]) – Level of simplification to apply to the mesh geometry. If None, no simplification is performed.

Returns:

A 4-tuple containing: - A list of material names extracted from the file. - A list comprising GridBox and GridSize arrays. - A list of vertex arrays for each material’s geometry. - A list of triangle arrays for each material’s geometry.

Return type:

Tuple[List[str], List[np.ndarray], List[np.ndarray], List[np.ndarray]]

tardis_em.utils.load_data.load_mrc_file(mrc: str) Tuple[ndarray, float] | Tuple[None, float]

Loads and processes an .mrc file to extract image data and pixel size. The function checks for the existence of the .mrc file, reads its header, and computes the appropriate pixel size based on dimensions in the header. It attempts to load the image data, ensures file integrity by mitigating corrupted file instances, and performs dimensional reshaping where necessary.

Parameters:

mrc (str) – The file path to the .mrc file to be loaded.

Returns:

A tuple containing the loaded image data and the pixel size (in Angstroms). If the file is corrupted and no valid image data can be retrieved, returns a tuple where the image data is None and pixel size is set to 1.0.

Return type:

Tuple[np.ndarray | None, float]

tardis_em.utils.load_data.load_nd2_file(nd2_dir: str) Tuple[ndarray, float]

Loads an ND2 file and processes the image into a specific format. This function is designed to read images from ND2 files, which may include movies or still images. It checks the dimensionality of the image and applies transformations to standardize the format before calculating certain statistical criteria to sort them. The result is the sorted image array and a scaling factor.

Parameters:

nd2_dir (str) – Path to the ND2 file to load.

Returns:

A tuple consisting of the processed image data as a numpy array and the scaling factor (float).

Return type:

Tuple[np.ndarray, float]

tardis_em.utils.load_data.load_ply_scannet(ply: str, downscaling=0, color: str | None = None) Tuple[ndarray, ndarray] | ndarray

Load and process a .ply file of the ScanNet dataset. This function reads the input .ply file, extracts point cloud data, and optionally loads RGB features or applies downscaling. It also maps ScanNet v2 labels to corresponding classes, if applicable.

Parameters:
  • ply (str) – Path to the .ply file containing point cloud data.

  • downscaling (int, optional) – Voxel size for downscaling the point cloud. If set to 0, no downscaling is applied.

  • color (str, optional) – path to a secondary .ply file containing RGB features for the point cloud.

Returns:

Either a tuple containing the downsampled point cloud coordinates and RGB features, or the downsampled point cloud coordinates only, depending on whether a color file was provided.

Return type:

Union[Tuple[ndarray, ndarray], ndarray]

tardis_em.utils.load_data.load_ply_partnet(ply, downscaling=0) ndarray

Loads a .ply file in the PartNet format and processes its point cloud by extracting coordinates and color information. Optionally performs downscaling of the point cloud and assigns unique labels to points based on their colors.

Parameters:
  • ply (str or file-like) – The path or file-like object referring to the .ply file to be loaded.

  • downscaling (int) – The voxel size used for downscaling the point cloud. If 0, no downscaling is applied.

Returns:

A NumPy array containing the processed point cloud. The array consists of assigned label IDs and the downscaled or original coordinates.

Return type:

np.ndarray

tardis_em.utils.load_data.load_txt_s3dis(txt: str, rgb=False, downscaling=0) Tuple[ndarray, ndarray] | ndarray

Loads a point cloud dataset in .txt format and optionally applies downscaling and extracts RGB color information if present.

The function interprets the .txt file specified by the txt parameter as a space-separated file containing point cloud data. The first three columns are assumed to be the X, Y, and Z coordinates. Any additional columns are interpreted as RGB values. Users can optionally apply voxel-based downscaling to reduce the resolution of the point cloud.

The function returns either the coordinates of the point cloud or a tuple containing the coordinates and their corresponding RGB values, depending on the value of the rgb argument.

Parameters:
  • txt (str) – Path to the .txt file containing the point cloud data.

  • rgb (bool) – A flag indicating whether the RGB values should be extracted. If set to True, the RGB values, if available, are returned along with the coordinates. Defaults to False.

  • downscaling (int) – Voxel size for downscaling the point cloud. If set to a value greater than 0, the point cloud will be downsampled to reduce resolution using a voxel-based approach. Defaults to 0 (no downscaling).

Returns:

If rgb is True, returns a tuple of two numpy arrays: the first array containing the downscaled coordinates and the second array containing RGB values. If rgb is False, returns a single numpy array containing only the downscaled coordinates.

Return type:

Union[Tuple[np.ndarray, np.ndarray], np.ndarray]

tardis_em.utils.load_data.load_s3dis_scene(dir_s: str, downscaling=0, random_ds=None, rgb=False) Tuple[ndarray, ndarray] | ndarray

Loads and processes a Stanford Large-Scale 3D Indoor Spaces (S3DIS) scene from a specified directory. This function creates a scene containing 3D spatial coordinates and optionally RGB color data. It also allows for downscaling of the scene, either with a fixed downscaling factor or a random downsampling threshold. The output format depends on whether RGB data is included and whether any downscaling is applied.

Parameters:
  • dir_s (str) – Directory containing the S3DIS scene files.

  • downscaling (int, optional) – Downscaling factor to reduce the scene resolution by voxelizing. Default is 0, meaning no downscaling is applied.

  • random_ds (float, optional) – Random downsampling threshold. Overrides downscaling if provided. Default is None, meaning no random downsampling is applied.

  • rgb (bool, optional) – Boolean indicating whether to include RGB color data in the output. If True, extracts [R, G, B] values along with spatial coordinates. Defaults to False.

Returns:

If rgb is True, returns a tuple containing the processed coordinate array and normalized RGB data as numpy arrays. If rgb is False, returns only the processed coordinate array. If downscaling or random downsampling is applied, the data is downscaled accordingly.

Return type:

Union[Tuple[np.ndarray, np.ndarray], np.ndarray]

tardis_em.utils.load_data.load_image(image: str, normalize=False, px=True) ndarray | Tuple[ndarray, float]

Loads an image file and processes it based on its type. The function supports several image formats including TIFF, MRC, AM, and ND2. Depending on the file type, the image is loaded using the respective file-loader function, and optional normalization can be applied. Pixel size information may be returned for some file types.

Parameters:
  • image (str) – Path to the image file to be loaded.

  • normalize (bool) – Flag indicating whether to apply normalization to the loaded image. Defaults to False.

  • px (bool) – Flag indicating whether to return pixel size information along with the image. Defaults to True.

Returns:

The loaded image and optionally the pixel size information as a float if px is True. If px is False, only the image is returned.

Return type:

Union[np.ndarray, Tuple[np.ndarray, float]]

Export data formats

class tardis_em.utils.export_data.NumpyToAmira(as_point_cloud=False, header: list | None = None)

Handles the conversion of numpy arrays into Amira-compatible format for analysis or visualization. This class supports functionalities for exporting 3D spline data, point clouds, or spatial graphs. It also provides utilities to validate and format data for Amira. The header and data formats are customizable via options or additional input parameters.

check_3d(coord: ndarray | None = typing.List) List[ndarray] | ndarray

Check and process the given 3D coordinate data ensuring it adheres to specific requirements. Depending on the input type and its shape, transformations are applied to make data compatible with the expected format. The method supports numpy arrays and iterables like lists or tuples containing numpy arrays. It adjusts or validates dimensions where necessary and applies an optional reordering operation for segment identifiers.

Parameters:

coord (Optional[np.ndarray], optional List) – The input coordinate data array, which can be a numpy array or an iterable (list or tuple) containing numpy arrays. It must have specific dimensional properties depending on the processing logic.

Returns:

Processed coordinate data. The output type depends on the as_point_cloud attribute. It returns a numpy array if the input is a numpy array and as_point_cloud is enabled. Otherwise, it returns a list of numpy arrays where reordering based on segment identifiers has been applied.

Return type:

Union[List[np.ndarray], np.ndarray]

export_amira(file_dir: str, coords: tuple | list | ~numpy.ndarray = <class 'numpy.ndarray'>, labels: tuple | list | None = None, scores: list | None = None)

Exports 3D coordinates data into a format compatible with Amira visualization software. This function supports exporting point clouds or data with edge and vertex relationships, optionally including labels and scores for additional metadata. If exporting as a point cloud, only the coordinates are processed and written into the file. In the case of segments, vertices and edges, along with their related attributes, are written into structured sections of the Amira file format.

Parameters:
  • file_dir (str) – Path to the output file where the data will be written.

  • coords (Union[tuple, list, np.ndarray]) – Coordinates of 3D points as a tuple, list, or numpy array. When exporting as segments, each segment is identified by its index.

  • labels (Union[tuple, list, None]) – Optional labels for the coordinates. Labels should align with the number of arrays in coords. Strings or a list of strings can be provided.

  • scores (Optional[list]) – Optional score information for the edges. Can include a list where the second element contains scores for corresponding edges.

Returns:

Returns None. The output is written to the file at the specified file_dir.

Return type:

None

tardis_em.utils.export_data.to_mrc(data: ndarray, pixel_size: float, file_dir: str, org_header: MRCHeader | None = None, label: List | None = None)

Converts numerical array data into an MRC file format and writes it to the specified directory. Allows optional parameters for setting the pixel size, header, and label information. Handles both 2D and 3D array data for creating appropriate headers and labels.

Parameters:
  • data (np.ndarray) – The numerical array data to be saved in MRC format.

  • pixel_size (float) – Size of the pixel in the data.

  • file_dir (str) – Directory or file path where the MRC file will be saved.

  • org_header (MRCHeader, optional) – Optional MRC header to use as a reference when creating the new file.

  • label (List, optional) – Optional list of labels or metadata to be included in the MRC file.

Returns:

None

tardis_em.utils.export_data.to_am(data: ndarray, pixel_size: float, file_dir: str, header: list | None = None)

Converts a 3D NumPy array into AmiraMesh format and writes it to a specified file. The function generates a header describing the dataset and its attributes, alongside the binary data.

Parameters:
  • data (np.ndarray) – A 3-dimensional NumPy array representing the lattice data.

  • pixel_size (float) – Size of the pixel in the dataset. Used to calculate bounding box dimensions.

  • file_dir (str) – Path to save the AmiraMesh file.

  • header (list or None) – Optional. A list of additional headers to include in the AmiraMesh file. Items not starting with “#” will be prefixed with “#”. Defaults to None.

Returns:

None

tardis_em.utils.export_data.to_stl(data: ndarray, file_dir: str)

Converts a given numpy.ndarray data to an STL file and saves it to the specified directory. The function processes a MultiBlock dataset, creates separate .stl files for each data part, merges them all, and outputs it as a single STL file. Helper functions are used to save individual STL files and process the first line to include solid names.

Parameters:
  • data (numpy.ndarray) – The point cloud data as a numpy.ndarray. Each unique value in the first column is treated as a separate entity to convert into 3D geometry.

  • file_dir (str) – The path to the directory where the combined STL file will be saved.

Returns:

None

Image normalization

tardis_em.utils.normalization.adaptive_threshold(img: ndarray)

Perform adaptive thresholding on an image array using mean and standard deviation.

This function calculates the standard deviation and mean of the input image. If the image is multichannel (e.g., RGB), it processes each channel independently. The threshold is determined by dividing the mean by the standard deviation. The resulting image is binarized by comparing pixel values to the thresholded value.

Parameters:

img (np.ndarray) – Input image as a NumPy array. For multichannel images, each channel is processed independently.

Returns:

A binarized image as a NumPy array with values set to 1 for pixels meeting the threshold criteria and 0 otherwise.

Return type:

np.ndarray

class tardis_em.utils.normalization.SimpleNormalize

SimpleNormalize class provides functionality for normalizing image data arrays to floating-point representations within the range of 0 to 1. The class aims to handle multiple types of integer-based image data, adjusting ranges based on the specific data type to ensure proper normalization.

class tardis_em.utils.normalization.MinMaxNormalize

Performs Min-Max normalization on input data.

This class normalizes input data to the range [0, 1] by scaling the values proportionally within this range using the minimum and maximum values of the input array. If the maximum value is less than or equal to zero, the normalization adjusts the input data before scaling by adding the absolute value of the minimum.

class tardis_em.utils.normalization.MeanStdNormalize

A class for normalizing input data using mean and standard deviation.

This class standardizes input data by centering it to have a mean of zero and scaling it to have a standard deviation of one. It is designed for use in preprocessing image data or other numerical datasets to improve performance in machine learning models.

class tardis_em.utils.normalization.RescaleNormalize(clip_range=(2, 98))

Performs rescale normalization on a given array.

This class is designed to normalize image or label mask input based on the specified clipping range using a percentile-based approach. It is particularly useful for preprocessing image data in various computer vision tasks, ensuring that output values fall within a defined intensity range.

class tardis_em.utils.normalization.FFTNormalize(method='affine', alpha=900, beta_i=1, num_iters=100, sample=1, use_cuda=False)
static gmm_fit(x, pi=0.5, split=None, alpha=0.5, beta_f=0.5, scale=1.0, tol=0.001, num_iters=100, share_var=True, verbose=False)

Performs Expectation-Maximization (EM) fitting of a two-component Gaussian Mixture Model (GMM) with a Beta distribution prior on the mixing coefficient. This function estimates the parameters of the GMM (means, variances, and mixing coefficients) and returns the log-likelihood and other model parameters.

The method begins with an initial parameter assignment based on a threshold or quantile cut-off and iteratively optimizes the model parameters using EM until the log-likelihood converges or the maximum number of iterations is reached.

Parameters:
  • x – Input tensor for the data to be modeled.

  • pi – Initial mixing coefficient for the Gaussian components. Defaults to 0.5.

  • split – Threshold value for initializing component assignments. If None, defaults to a quantile-based value computed from the data.

  • alpha – Parameter for the Beta distribution prior on the mixing coefficient. Defaults to 0.5.

  • beta_f – Parameter for the Beta distribution prior on the mixing coefficient. Defaults to 0.5.

  • scale – Scaling factor for the log-likelihood computation. Defaults to 1.0.

  • tol – Convergence tolerance for the log-likelihood difference between iteration steps. Defaults to 1e-3.

  • num_iters – Maximum number of iterations for the EM algorithm. Defaults to 100.

  • share_var – Boolean indicating whether the Gaussian components share a single variance value. Defaults to True.

  • verbose – Boolean to enable printing of the log-likelihood at each iteration. Defaults to False.

Returns:

A tuple containing: 1. Final log-likelihood of the fitted model. 2. Estimated mean of the first Gaussian component. 3. Estimated mean of the second Gaussian component. 4. Estimated variance of the second Gaussian component.

norm_fit(x, alpha=900, beta_i=1, scale=1.0, num_iters=100, use_cuda=False)

Fits a normalization model to the input data by iteratively evaluating different probability mixtures while maximizing a log-likelihood measure. The function tries multiple initializations of probabilities for Gaussian Mixture Models (GMM) and evaluates their effectiveness using a combination of beta-prior distribution and the data’s log-likelihood. Optimization terminates when the initialization with the maximum likelihood is found.

The function allows either single-component fitting (pi=1) or GMM-based multip-component fitting (pi<1). It incorporates optional GPU acceleration for computations and utilizes a blend of Torch and Scipy statistical tools.

Parameters:
  • x (numpy.ndarray or torch.Tensor) – Input data as a 1-dimensional array or tensor.

  • alpha (int) – Shape parameter for the beta prior distribution.

  • beta_i (int) – Second shape parameter for the beta prior distribution.

  • scale (float) – Scaling factor for the likelihood computation.

  • num_iters (int) – Number of iterations allowed for the GMM fitting process.

  • use_cuda (bool) – Boolean flag to determine if computations should run on GPU.

Returns:

Tuple containing the mean (mu) and standard deviation (std) of the best-fit normalization parameters optimized for maximum log-likelihood (-logp).

Return type:

tuple[float, float]