blocks
Flexible and composable building blocks for constructing neural-networks.
After features are embedded and combined, it is time to extract as much information as possible to predict the desired target. One way of doing this systematically is to repeat layers of identical internal architecture with residual (or skip) connections between them.
- class ActivatedBlock(mod_dim, activate=ELU(alpha=1.0), **kwargs)[source]
Bases:
Block
A single, non-linearly activated layer.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after the affine transformation. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
, depending on whether it needs to be further parameterized or not. Defaults toELU()
.**kwargs – Additional keyword arguments to pass through to the linear layers.
- class ActivatedHiddenBlock(mod_dim, activate=ELU(alpha=1.0), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]
Bases:
Block
A single, non-linearly activated hidden layer of configurable size.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after projecting into higher- dimensional space. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
, depending on whether it needs to be further parameterized or not. Defaults toELU()
.drop (Module, optional) – Dropout to be applied after activation. Typically an instance of
Dropout
orAlphaDropout
. Defaults toDropout(p=0.0)
, resulting in no dropout being applied.hidden_factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.
- class GatedBlock(mod_dim, gate=Sigmoid(), **kwargs)[source]
Bases:
Block
A configurable, gated linear unit (GLU).
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.**kwargs – Additional keyword arguments to pass through to the linear layers.
- class GatedHiddenBlock(mod_dim, gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]
Bases:
Block
A configurable, gated linear unit (GLU) with single hidden layer.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.drop (Module, optional) – Dropout to be applied after gating. Typically an instance of
Dropout
orAlphaDropout
. Defaults toDropout(p=0.0)
, resulting in no dropout being applied.hidden_factor (int, optional) – The size of the hidden layer before reducing by two through gating is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.
- property dim
The hidden dimension after gating.
- class ActivatedGatedBlock(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]
Bases:
Block
An activated, hidden layer, followed by a gated linear unit (GLU).
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after (linear) projection, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to anELU
activation.gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.drop (Module, optional) – Dropout to be applied before gating. Typically an instance of
Dropout
orAlphaDropout
. Defaults toDropout(p=0.0)
, resulting in no dropout being applied.hidden_factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.
- class GatedResidualBlock(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), **kwargs)[source]
Bases:
Block
Gated Residual Network (GRN) for efficiently extracting information.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after (linear) projection, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to anELU
activation.gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.drop (Module, optional) – Dropout to be applied within the GRN. Typically an instance of
Dropout
orAlphaDropout
. Defaults toDropout(p=0.0)
, resulting in no dropout being applied.**kwargs – Additional keyword arguments to pass through to the linear layers.
Note
This implementation is inspired by how features are encoded in Temporal Fusion Transformers, [1] but it is not quite the same. Firstly, the intermediate linear layer (Eq. 3) is eliminated and dropout is applied directly to the activations after the first layer. Secondly, the layer norm (Eq. 2) is replaced by simply dividing the sum of (linearly projected) input and gated signal by 2. Should additional normalization be desired, it can be performed independently on the output of this module.
References
- class SkipConnection(block, drop=Dropout(p=0.0, inplace=False), norm_first=True, norm_cls=<class 'swak.pt.misc.Identity'>, *args, **kwargs)[source]
Bases:
Block
Add a residual/skip connection around the wrapped neural-network block.
- Parameters:
block (Block) – The block to wrap the residual/skip connection around. The reason why this cannot simply be a
Module
is that currently PyTorch does not provide a reasonable way of cloning them.drop (Module, optional) – Dropout to be applied to the output of block before adding it to its input. Typically an instance of
Dropout
orAlphaDropout
. Defaults toDropout(p=0.0)
, resulting in no dropout being applied.norm_first – If
True
, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. IfFalse
, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults toTrue
.bool – If
True
, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. IfFalse
, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults toTrue
.optional – If
True
, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. IfFalse
, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults toTrue
.norm_cls (type, optional) – The class of the norm to be applied after adding input to output, e.g.,
LayerNorm
orBatchNorm1d
. Again, this is needed to easily create a fresh, new instances with equal, but independent parameters. Defaults toIdentity
, resulting in no normalization whatsoever.*args – Arguments used to initialize an instance of norm_cls.
**kwargs – Keyword arguments used to initialize an instance of norm_cls.
- class Repeat(skip, n_layers=2)[source]
Bases:
Block
Repeat a skip-connection, distilling ever finer detail from your data.
- Parameters:
skip (SkipConnection) – An instance of a
SkipConnection
to repeat.n_layers (int, optional) – How often to repeat the skip. Defaults to 2.
Notes
If the skip-connection sets norm_first to
True
, the final output of the last repetition will also be normalized (with a fresh instance of the exact same norm type used by the skip-connection).- forward(inp)[source]
Forward pass through a stack of identical skip-connection blocks.
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be
mod_dim
.- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property layers
Range of layer indices.
Base classes
- class Resettable(*_, **__)[source]
Bases:
Module
,ABC
Abstract base class for Modules with a
reset_parameters
method.
- class Block(*_, **__)[source]
Bases:
Resettable
Abstract base class for stackable/repeatable neural-network components.
The input and output tensors of such components must have the same dimensions and sizes!