blocks

Flexible and composable building blocks for constructing neural-networks.

After features are embedded and combined, it is time to extract as much information as possible to predict the desired target. One way of doing this systematically is to repeat layers of identical internal architecture with residual (or skip) connections between them.

class ActivatedBlock(mod_dim, activate=ELU(alpha=1.0), **kwargs)[source]

Bases: Block

A single, non-linearly activated layer.

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after the affine transformation. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to ELU().
**kwargs – Additional keyword arguments to pass through to the linear layers.

forward(inp)[source]

Forward pass through a single, non-linearly activated layer.

Parameters:: inp (Tensor) – The size of the last dimension is expected to be mod_dim.
Returns:: Same dimensions and sizes as the input tensor.
Return type:: Tensor

new()[source]: Return a fresh, new instance with exactly the same parameters.

reset_parameters()[source]: Re-initialize the internal parameters of the linear projections.

class ActivatedHiddenBlock(mod_dim, activate=ELU(alpha=1.0), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]

Bases: Block

A single, non-linearly activated hidden layer of configurable size.

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after projecting into higher- dimensional space. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to ELU().
drop (Module, optional) – Dropout to be applied after activation. Typically an instance of Dropout or AlphaDropout. Defaults to Dropout(p=0.0), resulting in no dropout being applied.
hidden_factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.

forward(inp)[source]

Forward pass through a single, non-linearly activated hidden layer.

Parameters:: inp (Tensor) – The size of the last dimension is expected to be mod_dim.
Returns:: Same dimensions and sizes as the input tensor.
Return type:: Tensor

new()[source]: Return a fresh, new instance with exactly the same parameters.

reset_parameters()[source]: Re-initialize the internal parameters of the block.

class GatedBlock(mod_dim, gate=Sigmoid(), **kwargs)[source]

Bases: Block

A configurable, gated linear unit (GLU).

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.
**kwargs – Additional keyword arguments to pass through to the linear layers.

forward(inp)[source]

Forward pass through a single gated linear unit (GLU).

Parameters:: inp (Tensor) – The size of the last dimension is expected to be mod_dim.
Returns:: Same dimensions and sizes as the input tensor.
Return type:: Tensor

new()[source]: Return a fresh, new instance with exactly the same parameters.

reset_parameters()[source]: Re-initialize the internal parameters of the block.

class GatedHiddenBlock(mod_dim, gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]

Bases: Block

A configurable, gated linear unit (GLU) with single hidden layer.

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.
drop (Module, optional) – Dropout to be applied after gating. Typically an instance of Dropout or AlphaDropout. Defaults to Dropout(p=0.0), resulting in no dropout being applied.
hidden_factor (int, optional) – The size of the hidden layer before reducing by two through gating is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.

property dim: The hidden dimension after gating.

forward(inp)[source]

Forward pass through a gated linear unit (GLU) with a hidden layer.

Parameters:: inp (Tensor) – The size of the last dimension is expected to be mod_dim.
Returns:: Same dimensions and sizes as the input tensor.
Return type:: Tensor

new()[source]: Return a fresh, new instance with exactly the same parameters.

reset_parameters()[source]: Re-initialize the internal parameters of the block.

class ActivatedGatedBlock(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]

Bases: Block

An activated, hidden layer, followed by a gated linear unit (GLU).

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after (linear) projection, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to an ELU activation.
gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.
drop (Module, optional) – Dropout to be applied before gating. Typically an instance of Dropout or AlphaDropout. Defaults to Dropout(p=0.0), resulting in no dropout being applied.
hidden_factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.

forward(inp)[source]

Forward pass through a activated, hidden layer followed by a GLU.

Parameters:: inp (Tensor) – The size of the last dimension is expected to be mod_dim.
Returns:: Same dimensions and sizes as the input tensor.
Return type:: Tensor

new()[source]: Return a fresh, new instance with exactly the same parameters.

reset_parameters()[source]: Re-initialize the internal parameters of the block.

class GatedResidualBlock(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), **kwargs)[source]

Bases: Block

Gated Residual Network (GRN) for efficiently extracting information.

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after (linear) projection, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to an ELU activation.
gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.
drop (Module, optional) – Dropout to be applied within the GRN. Typically an instance of Dropout or AlphaDropout. Defaults to Dropout(p=0.0), resulting in no dropout being applied.
**kwargs – Additional keyword arguments to pass through to the linear layers.

Note

This implementation is inspired by how features are encoded in Temporal Fusion Transformers, [1] but it is not quite the same. Firstly, the intermediate linear layer (Eq. 3) is eliminated and dropout is applied directly to the activations after the first layer. Secondly, the layer norm (Eq. 2) is replaced by simply dividing the sum of (linearly projected) input and gated signal by 2. Should additional normalization be desired, it can be performed independently on the output of this module.

References

forward(inp)[source]

Forward pass through a single gated residual network (GRN).

Parameters:: inp (Tensor) – The size of the last dimension is expected to be mod_dim.
Returns:: Same dimensions and sizes as the input tensor.
Return type:: Tensor

new()[source]: Return a fresh, new instance with exactly the same parameters.

reset_parameters()[source]: Re-initialize the internal parameters of the block.

class SkipConnection(block, drop=Dropout(p=0.0, inplace=False), norm_first=True, norm_cls=<class 'swak.pt.misc.Identity'>, *args, **kwargs)[source]

Bases: Block

Add a residual/skip connection around the wrapped neural-network block.

Parameters:

block (Block) – The block to wrap the residual/skip connection around. The reason why this cannot simply be a Module is that currently PyTorch does not provide a reasonable way of cloning them.
drop (Module, optional) – Dropout to be applied to the output of block before adding it to its input. Typically an instance of Dropout or AlphaDropout. Defaults to Dropout(p=0.0), resulting in no dropout being applied.
norm_first – If True, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. If False, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults to True.
bool – If True, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. If False, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults to True.
optional – If True, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. If False, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults to True.
norm_cls (type, optional) – The class of the norm to be applied after adding input to output, e.g., LayerNorm or BatchNorm1d. Again, this is needed to easily create a fresh, new instances with equal, but independent parameters. Defaults to Identity, resulting in no normalization whatsoever.
*args – Arguments used to initialize an instance of norm_cls.
**kwargs – Keyword arguments used to initialize an instance of norm_cls.

forward(inp)[source]

Forward pass through a block with the input added to the output.

Parameters:: inp (Tensor) – The size of the last dimension is expected to be mod_dim.
Returns:: Same dimensions and sizes as the input tensor.
Return type:: Tensor

new()[source]: Return a fresh, new instance with exactly the same parameters.

reset_parameters()[source]: Re-initialize the internal parameters of the block and the norm.

class Repeat(skip, n_layers=2)[source]

Bases: Block

Repeat a skip-connection, distilling ever finer detail from your data.

Parameters:

skip (SkipConnection) – An instance of a SkipConnection to repeat.
n_layers (int, optional) – How often to repeat the skip. Defaults to 2.

Notes

If the skip-connection sets norm_first to True, the final output of the last repetition will also be normalized (with a fresh instance of the exact same norm type used by the skip-connection).

forward(inp)[source]

Forward pass through a stack of identical skip-connection blocks.

Parameters:: inp (Tensor) – The size of the last dimension is expected to be mod_dim.
Returns:: Same dimensions and sizes as the input tensor.
Return type:: Tensor

property layers: Range of layer indices.

new()[source]: Return a fresh, new instance with exactly the same parameters.

reset_parameters()[source]: Re-initialize the internal parameters of all blocks.

Base classes

class Resettable(*_, **__)[source]

Bases: Module, ABC

Abstract base class for Modules with a reset_parameters method.

abstractmethod reset_parameters()[source]: Subclasses implement in-place reset of all internal parameters.

class Block(*_, **__)[source]

Bases: Resettable

Abstract base class for stackable/repeatable neural-network components.

The input and output tensors of such components must have the same dimensions and sizes!

abstractmethod new()[source]: Return a fresh, new instance with exactly the same parameters.