blocks

Flexible and composable building blocks for constructing neural-networks.

After features are embedded and combined, it is time to extract as much information as possible to predict the desired target. One way of doing this systematically is to repeat layers of identical internal architecture with residual (or skip) connections between them.

class IdentityBlock(mod_dim, *_, **__)[source]

Bases: Block

PyTorch module that passes a tensor right through, doing nothing.

This is a placeholder for instances where a default Module is required that not only has a reset_parameters() method, but also a new() method in addition to mod_dim, device and dtype. Providing any number of (keyword) arguments on instantiation is permitted, but they are ignored.

Parameters:

mod_dim (int) – Ignored but mandatory to maintain API compatibility.

property device

Just for API compatibility. Always returns None.

property dtype

Just for API compatibility. Always returns None.

forward(tensor, *_, **__)[source]

Pass through first argument, ignore additional (keyword) arguments.

Parameters:

tensor (Tensor) – Any argument (typically a tensor) to be passed straight through.

Returns:

The tensor passed in as argument.

Return type:

Tensor

property mod_dim

The model dimension.

new()[source]

Return a fresh, new instance.

Providing any number of (keyword) arguments is permitted, but they will be ignored.

Returns:

A fresh, new instance of itself.

Return type:

IdentityBlock

reset_parameters()[source]

Does nothing because there are no internal parameters to reset.

class ActivatedBlock(mod_dim, activate=ELU(alpha=1.0), bias=True, device='cpu', dtype=torch.float32)[source]

Bases: Block

A single, non-linearly activated layer.

Parameters:
  • mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.

  • activate (Module or function, optional) – The activation function to be applied after projecting into higher- dimensional space. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to ELU().

  • bias (bool, optional) – Whether to add a learnable bias vector in to the projections. Defaults to True.

  • device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.

  • dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to torch.float.

property device

The device all weights, biases, activations, etc. reside on.

property dtype

The dtype of all weights, biases, activations, and parameters.

forward(inp)[source]

Forward pass through a single, non-linearly activated hidden layer.

Parameters:

inp (Tensor) – The size of the last dimension is expected to be mod_dim.

Returns:

Same dimensions and sizes as the input tensor.

Return type:

Tensor

property mod_dim

The model dimension.

new()[source]

Return a fresh, new instance with exactly the same parameters.

Returns:

A fresh, new instance of itself.

Return type:

ActivatedBlock

reset_parameters()[source]

Re-initialize the internal parameters of the block.

class ActivatedHiddenBlock(mod_dim, activate=ELU(alpha=1.0), factor=4, bias=True, device='cpu', dtype=torch.float32)[source]

Bases: Block

A single, non-linearly activated hidden layer of configurable size.

Parameters:
  • mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.

  • activate (Module or function, optional) – The activation function to be applied after projecting into higher- dimensional space. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to ELU().

  • factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.

  • bias (bool, optional) – Whether to add a learnable bias vector in to the projections. Defaults to True.

  • device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.

  • dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to torch.float.

property device

The device all weights, biases, activations, etc. reside on.

property dtype

The dtype of all weights, biases, activations, and parameters.

forward(inp)[source]

Forward pass through a single, non-linearly activated hidden layer.

Parameters:

inp (Tensor) – The size of the last dimension is expected to be mod_dim.

Returns:

Same dimensions and sizes as the input tensor.

Return type:

Tensor

property mod_dim

The model dimension.

new()[source]

Return a fresh, new instance with exactly the same parameters.

Returns:

A fresh, new instance of itself.

Return type:

ActivatedHiddenBlock

reset_parameters()[source]

Re-initialize the internal parameters of the block.

class GatedBlock(mod_dim, gate=Sigmoid(), bias=True, device='cpu', dtype=torch.float32)[source]

Bases: Block

A configurable, gated linear unit (GLU).

Parameters:
  • mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.

  • gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.

  • bias (bool, optional) – Whether to add a learnable bias vector in to the projections. Defaults to True.

  • device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.

  • dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to torch.float.

property device

The device all weights, biases, activations, etc. reside on.

property dtype

The dtype of all weights, biases, activations, and parameters.

forward(inp)[source]

Forward pass through a single gated linear unit (GLU).

Parameters:

inp (Tensor) – The size of the last dimension is expected to be mod_dim.

Returns:

Same dimensions and sizes as the input tensor.

Return type:

Tensor

property mod_dim

The model dimension.

new()[source]

Return a fresh, new instance with exactly the same parameters.

Returns:

A fresh, new instance of itself.

Return type:

GatedBlock

reset_parameters()[source]

Re-initialize the internal parameters of the block.

class GatedHiddenBlock(mod_dim, gate=Sigmoid(), factor=4, bias=True, device='cpu', dtype=torch.float32)[source]

Bases: Block

A configurable, gated linear unit (GLU) with single hidden layer.

Parameters:
  • mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.

  • gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.

  • factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.

  • bias (bool, optional) – Whether to add a learnable bias vector in to the projections. Defaults to True.

  • device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.

  • dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to torch.float.

property device

The device all weights, biases, activations, etc. reside on.

property dim

The hidden dimension after gating.

property dtype

The dtype of all weights, biases, activations, and parameters.

forward(inp)[source]

Forward pass through a gated linear unit (GLU) with a hidden layer.

Parameters:

inp (Tensor) – The size of the last dimension is expected to be mod_dim.

Returns:

Same dimensions and sizes as the input tensor.

Return type:

Tensor

property mod_dim

The model dimension.

new()[source]

Return a fresh, new instance with exactly the same parameters.

Returns:

A fresh, new instance of itself.

Return type:

GatedHiddenBlock

reset_parameters()[source]

Re-initialize the internal parameters of the block.

class GatedActivatedBlock(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), bias=True, device='cpu', dtype=torch.float32)[source]

Bases: Block

Gated Residual Network (GRN) for efficiently extracting information.

Parameters:
  • mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.

  • activate (Module or function, optional) – The activation function to be applied after (linear) projection, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to an ELU activation.

  • gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.

  • bias (bool, optional) – Whether to add a learnable bias vector in the projections. Defaults to True.

  • device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.

  • dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to torch.float.

Note

Inspired by Gated Residual Network (GRN) introduced in [1], this module (linearly) projects the input, applies a non-linearity, and gates the result by a sigmoid activation of a projection of the same intermediate representation, giving the model per-dimension control over how much non-linearity contributes to the output.

References

property device

The device all weights, biases, activations, etc. reside on.

property dtype

The dtype of all weights, biases, activations, and parameters.

forward(inp)[source]

Forward pass through a single gated residual network (GRN).

Parameters:

inp (Tensor) – The size of the last dimension is expected to be mod_dim.

Returns:

Same dimensions and sizes as the input tensor.

Return type:

Tensor

property mod_dim

The model dimension.

new()[source]

Return a fresh, new instance with exactly the same parameters.

Returns:

A fresh, new instance of itself.

Return type:

GatedActivatedBlock

reset_parameters()[source]

Re-initialize the internal parameters of the block.

class SkipConnection(block, dropout=0.0, norm_first=True, norm_cls=<class 'swak.pt.misc.identity.Identity'>, *args, device='cpu', dtype=torch.float32, **kwargs)[source]

Bases: Block

Add a residual/skip connection around the wrapped neural-network block.

Parameters:
  • block (Block) – The block to wrap the residual/skip connection around. The reason why this cannot simply be a Module is that currently PyTorch does not provide a reasonable way of cloning them.

  • dropout (float, optional) – The amount of dropout to apply to the block’s output before adding it back to the activated residual. Defaults to 0.

  • norm_first (bool, optional) – If True, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. If False, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults to True.

  • norm_cls (type, optional) – The class of the norm to be applied after adding input to output, e.g., LayerNorm or BatchNorm1d. Again, this is needed to easily create a fresh, new instances with equal, but independent parameters. Defaults to Identity, resulting in no normalization.

  • *args – Arguments used to initialize an instance of norm_cls.

  • device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.

  • dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to torch.float.

  • **kwargs – Keyword arguments used to initialize an instance of norm_cls.

See also

Identity

property device

The device all weights, biases, activations, etc. reside on.

property dtype

The dtype of all weights, biases, activations, and parameters.

forward(inp)[source]

Forward pass through a block with the input added to the output.

Parameters:

inp (Tensor) – The size of the last dimension is expected to be mod_dim.

Returns:

Same dimensions and sizes as the input tensor.

Return type:

Tensor

property mod_dim

The model dimension.

new()[source]

Return a fresh, new instance with exactly the same parameters.

Returns:

A fresh, new instance of itself.

Return type:

SkipConnection

reset_parameters()[source]

Re-initialize the internal parameters of the block and the norm.

class Repeat(skip, n_layers=2, device='cpu', dtype=torch.float32)[source]

Bases: Block

Repeat a skip-connection, distilling ever finer detail from your data.

Parameters:
  • skip (SkipConnection) – An instance of a SkipConnection to repeat.

  • n_layers (int, optional) – How often to repeat the skip. Defaults to 2.

  • device (str or torch.device, optional) – Torch device to first create the blocks on. Defaults to “cpu”.

  • dtype (torch.dtype, optional) – Torch dtype to first create the blocks in. Defaults to torch.float.

Raises:
  • TypeError – If n_layers is not an integer.

  • ValueError – If n_layers is smaller than 1.

Note

If the skip-connection sets norm_first to True, no norm will be applied to the final output of the last repetition. If a trailing norm is desired, it should be applied externally, after this module.

See also

SkipConnection

property device

The device all weights, biases, activations, etc. reside on.

property dtype

The dtype of all weights, biases, activations, and parameters.

forward(inp)[source]

Forward pass through a stack of identical skip-connection blocks.

Parameters:

inp (Tensor) – The size of the last dimension is expected to be mod_dim.

Returns:

Same dimensions and sizes as the input tensor.

Return type:

Tensor

property layers

Range of layer indices.

property mod_dim

The model dimension.

new()[source]

Return a fresh, new instance with exactly the same parameters.

Returns:

A fresh, new instance of itself.

Return type:

Repeat

reset_parameters()[source]

Re-initialize the internal parameters of all blocks.

Base classes

class Resettable(*_, **__)[source]

Bases: Module, ABC

Abstract base class for Modules with a reset_parameters().

abstractmethod reset_parameters()[source]

Subclasses implement in-place reset of all internal parameters.

class Block(*_, **__)[source]

Bases: Resettable

Abstract base class for neural-network components.

abstract property device

Return the device that parameters/weights live on.

abstract property dtype

Return the dtype of parameters/weight.

abstract property mod_dim

Return the embedding dimension of the module.

abstractmethod new()[source]

Return a fresh, new instance with exactly the same parameters.