CNN -> Utils

CNN Builder

class tardis_em.cnn.utils.build_cnn.BasicCNN(model='CNN', in_channels=1, out_channels=1, sigmoid=True, num_conv_layer=5, conv_layer_scaler=64, conv_kernel=3, padding=1, pool_kernel=2, img_patch_size=64, layer_components='3gcl', dropout=None, num_group=8, prediction=False)

A neural network model class for constructing and managing a customizable Convolutional Neural Network (CNN) and its variations.

This class serves as a flexible framework for building encoder and decoder pipelines, defining the architecture of convolutional layers, and configuring associated parameters. It allows for adaptable hyperparameter settings to tailor the neural network model to specific tasks, including image patch processing and segmentation tasks. Additionally, it manages activation functions and final layer outputs suitable for either classification or prediction purposes.

update_patch_size(img_patch_size, sigmoid)

Updates the image patch sizes and rebuilds the model if necessary.

This function recalculates the patch sizes for an image as it traverses through convolutional layers. It also rebuilds the CNN model if the current model is not of type string, retaining the previous model’s state dictionary.

Parameters:
  • img_patch_size (int) – The initial size of the image patches to be processed.

  • sigmoid (Callable or function) – The sigmoid activation function applied to predictions to constrain outputs.

Returns:

None

build_cnn_model()

Constructs a Convolutional Neural Network (CNN) or Recurrent Convolutional Neural Network (RCNN) model by building encoder, decoder, and a final layer based on the configuration provided. This function handles the creation of the model architecture including layers, activation functions, and other components.

class tardis_em.cnn.utils.build_cnn.UNet(model='CNN', **kwargs)

Implementation of a U-Net model derived from a basic convolutional neural network.

The U-Net model is widely used in image segmentation tasks. This class is designed to handle encoder-decoder architectures for feature extraction and reconstruction. It applies the U-Net structure with added flexibility for extensions or customizations as needed in specific tasks.

“3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation” <https://arxiv.org/pdf/1606.06650.pdf>.

forward(x: Tensor) Tensor

Performs the forward pass through the network which includes an encoding stage, decoding stage, and a final prediction step.

Parameters:

x (torch.Tensor) – Input tensor that will be passed through the encoder, decoder, and prediction stages of the network.

Returns:

Output tensor produced after applying the encoder, decoder, and the final activation (if prediction is enabled).

Return type:

torch.Tensor

class tardis_em.cnn.utils.build_cnn.ResUNet(model='RCNN', **kwargs)

A residual U-Net (ResUNet) model implementation extending the BasicCNN base class.

A class structure designed for image segmentation tasks utilizing a Residual U-Net architecture. The network is composed of an encoder and a decoder. The encoder extracts features, and the decoder reconstructs the features into a segmentation mask. It supports flexible customization for model variations through inheritance and additional parameters.

modified of <10.1016/j.isprsjprs.2020.01.013>

forward(x: Tensor) Tensor

Performs the forward pass through the network which includes an encoding stage, decoding stage, and a final prediction step.

Parameters:

x (torch.Tensor) – Input tensor that will be passed through the encoder, decoder, and prediction stages of the network.

Returns:

Output tensor produced after applying the encoder, decoder, and the final activation (if prediction is enabled).

Return type:

torch.Tensor

class tardis_em.cnn.utils.build_cnn.UNet3Plus(in_channels=1, out_channels=1, sigmoid=True, num_conv_layer=5, conv_layer_scaler=64, conv_kernel=3, padding=1, pool_kernel=2, img_patch_size=64, layer_components='3gcl', dropout=None, num_group=8, prediction=False, classifies=False, decoder_features=None)

UNet3Plus is a neural network model based on the U-Net architecture with enhancements specifically designed for segmentation tasks. It integrates improvements in both encoder and decoder components and optionally provides classification capabilities. This class is highly configurable and supports both 2D and 3D inputs.

The class consists of an encoder that extracts features from the input, a decoder that reconstructs the segmentation map from these features, optional classification modules, and a final prediction activation layer. Users can configure the number of convolutional layers, kernel sizes, pooling operations, dropout, and other settings to adapt the model to a wide variety of use cases.

modified of <https://arxiv.org/abs/2004.08790>

encoder

UNet3Plus classifier

decoder

Final Layer

static dot_product(x: Tensor, x_cls: Tensor) Tensor

Computes the dot product of two tensors x and x_cls along specific dimensions. The function first reshapes the input tensors to flatten the spatial dimensions (H, W) and depth (D) into a single dimension for efficient computation. Then, it performs an element-wise dot product of the reshaped tensors. The result is reshaped back into the original spatial and depth dimensions of the inputs.

Parameters:
  • x – A tensor of shape (B, N, D, H, W), where B represents the batch size, N represents the number of channels or features, D represents the depth, and H, W represent spatial dimensions.

  • x_cls – A tensor of shape (B, N, D, H, W), aligned with the same shape as x, used for element-wise dot product computation.

Returns:

A tensor of the same shape (B, N, D, H, W) as the input, containing the result of the dot product operation computed along specific dimensions.

forward(x: Tensor)

Performs the forward pass through the network which includes an encoding stage, decoding stage, and a final prediction step.

Parameters:

x (torch.Tensor) – Input tensor that will be passed through the encoder, decoder, and prediction stages of the network.

Returns:

Output tensor produced after applying the encoder, decoder, and the final activation (if prediction is enabled).

Return type:

torch.Tensor

class tardis_em.cnn.utils.build_cnn.FNet(in_channels=1, out_channels=1, sigmoid=True, num_conv_layer=5, conv_layer_scaler=64, conv_kernel=3, padding=1, pool_kernel=2, dropout=None, img_patch_size=64, layer_components='3gcl', num_group=8, attn_features=False, prediction=False)

FNet model for image segmentation.

This class implements the FNet model, designed for feature extraction and image segmentation tasks. It includes an encoder for processing input features, two decoder variants (UNet and UNet3+), and a final output layer for generating a segmentation map. It supports configurable parameters for convolutional layers, kernel size, dropout, padding, pool kernel size, and other hyperparameters. The class can predict either sigmoid- or softmax-activated probability masks for segmentation.

update_patch_size(img_patch_size, prediction)

Updates the image patch sizes and rebuilds decoders with the new configuration.

This method is responsible for updating the internal patch sizes based on the provided img_patch_size and constructing new decoders (decoder_unet and decoder_3plus) using the updated patch sizes. Furthermore, it preserves the state of the existing decoder models and reloads them into the newly built decoders.

Parameters:
  • img_patch_size (int) – Initial size of the image patch to be used for calculating new patch sizes across layers.

  • prediction (bool) – Indicates if the operation is performed during prediction mode.

Returns:

This method does not return a value.

Return type:

None

build_cnn_model()

Build the CNN model by constructing encoder, decoder, and final prediction layers.

forward(x: Tensor)

Performs the forward pass through the network which includes an encoding stage, decoding stage, and a final prediction step.

Parameters:

x (torch.Tensor) – Input tensor that will be passed through the encoder, decoder, and prediction stages of the network.

Returns:

Output tensor produced after applying the encoder, decoder, and the final activation (if prediction is enabled).

Return type:

torch.Tensor

CNN Utils

tardis_em.cnn.utils.utils.number_of_features_per_level(channel_scaler: int, num_levels: int) list

Calculates the number of features per level in a hierarchical structure, where the feature count doubles at each subsequent level starting from the initial level. The initial feature count is specified by channel_scaler, and the total number of levels is defined by num_levels.

Parameters:
  • channel_scaler (int) – Initial feature count at the first level.

  • num_levels (int) – Total number of levels in the hierarchical structure.

Returns:

A list containing the feature counts for each level, starting from the initial level up to the final level.

Return type:

list

tardis_em.cnn.utils.utils.max_number_of_conv_layer(img=None, input_volume=64, max_out=8, kernel_size=3, padding=1, stride=1, pool_size=2, pool_stride=2, first_max_pool=False) int

Calculates the maximum possible number of convolutional layers in a neural network, given specific image dimensions, convolution parameters, and pooling parameters. If an image is provided, the calculation uses the smallest dimension of the image for the computation.

Parameters:
  • img – An optional input tensor of image data, which may be a 4D (batch_size, channels, height, width) or 5D (batch_size, time, channels, height, width) tensor. If provided, the computational dimension will be chosen based on the smallest spatial value.

  • input_volume – The input volume size (spatial dimension) for the computation, if no image is provided. This is the starting dimension for estimating possible convolutional layers.

  • max_out – The minimum allowable output volume size that determines when computation for convolutional layers should end.

  • kernel_size – The size of the convolutional kernel/filter.

  • padding – The number of zero-padding pixels added around each convolutional operation.

  • stride – The stride size for the convolutional operation.

  • pool_size – The size of the pooling window for down-sampling.

  • pool_stride – The stride size for pooling.

  • first_max_pool – A flag determining whether the first max-pooling operation should be applied. If set to True, the number of layers is reduced by one to account for the absence of a pooling step.

Returns:

The maximum possible number of convolutional layers as an integer that can fit within the specified dimensions and parameters.

tardis_em.cnn.utils.utils.normalize_image(image: ndarray) ndarray

Normalizes the given image by checking its minimum and maximum values. The function ensures that the image is either normalized between 0 and 1 or converted into a binary representation based on the pixel intensity.

Parameters:

image (np.ndarray) – An array representing the image to be normalized.

Returns:

A normalized image array.

Return type:

np.ndarray

tardis_em.cnn.utils.utils.check_model_dict(model_dict: dict) dict

Analyzes the provided dictionary containing model configuration and maps its keys to a standardized format for further processing. This function ensures that specific key patterns in the input dictionary are identified and their corresponding values are transferred into a new dictionary with standardized key names.

Parameters:

model_dict (dict) – The input dictionary containing model configuration settings. Keys may represent various model attributes and must conform to specific naming patterns to be mapped accordingly.

Returns:

A new dictionary containing remapped key-value pairs, where keys follow a standardized naming convention. Only keys matching the predefined patterns are included in the output dictionary.

Return type:

dict