blocks
Flexible and composable building blocks for constructing neural-networks.
After features are embedded and combined, it is time to extract as much information as possible to predict the desired target. One way of doing this systematically is to repeat layers of identical internal architecture with residual (or skip) connections between them.
- class ActivatedBlock(mod_dim, activate=ELU(alpha=1.0), **kwargs)[source]
Bases:
BlockA single, non-linearly activated layer.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after the affine transformation. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults toELU().**kwargs – Additional keyword arguments to pass through to the linear layers.
- class ActivatedHiddenBlock(mod_dim, activate=ELU(alpha=1.0), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]
Bases:
BlockA single, non-linearly activated hidden layer of configurable size.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after projecting into higher- dimensional space. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults toELU().drop (Module, optional) – Dropout to be applied after activation. Typically an instance of
DropoutorAlphaDropout. Defaults toDropout(p=0.0), resulting in no dropout being applied.hidden_factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.
- class GatedBlock(mod_dim, gate=Sigmoid(), **kwargs)[source]
Bases:
BlockA configurable, gated linear unit (GLU).
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.**kwargs – Additional keyword arguments to pass through to the linear layers.
- class GatedHiddenBlock(mod_dim, gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]
Bases:
BlockA configurable, gated linear unit (GLU) with single hidden layer.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.drop (Module, optional) – Dropout to be applied after gating. Typically an instance of
DropoutorAlphaDropout. Defaults toDropout(p=0.0), resulting in no dropout being applied.hidden_factor (int, optional) – The size of the hidden layer before reducing by two through gating is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.
- property dim
The hidden dimension after gating.
- class ActivatedGatedBlock(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), hidden_factor=4, **kwargs)[source]
Bases:
BlockAn activated, hidden layer, followed by a gated linear unit (GLU).
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after (linear) projection, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to anELUactivation.gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.drop (Module, optional) – Dropout to be applied before gating. Typically an instance of
DropoutorAlphaDropout. Defaults toDropout(p=0.0), resulting in no dropout being applied.hidden_factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.
**kwargs – Additional keyword arguments to pass through to the linear layers.
- class GatedResidualBlock(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), **kwargs)[source]
Bases:
BlockGated Residual Network (GRN) for efficiently extracting information.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after (linear) projection, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to anELUactivation.gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.drop (Module, optional) – Dropout to be applied within the GRN. Typically an instance of
DropoutorAlphaDropout. Defaults toDropout(p=0.0), resulting in no dropout being applied.**kwargs – Additional keyword arguments to pass through to the linear layers.
Note
This implementation is inspired by how features are encoded in Temporal Fusion Transformers, [1] but it is not quite the same. Firstly, the intermediate linear layer (Eq. 3) is eliminated and dropout is applied directly to the activations after the first layer. Secondly, the layer norm (Eq. 2) is replaced by simply dividing the sum of (linearly projected) input and gated signal by 2. Should additional normalization be desired, it can be performed independently on the output of this module.
References
- class SkipConnection(block, drop=Dropout(p=0.0, inplace=False), norm_first=True, norm_cls=<class 'swak.pt.misc.Identity'>, *args, **kwargs)[source]
Bases:
BlockAdd a residual/skip connection around the wrapped neural-network block.
- Parameters:
block (Block) – The block to wrap the residual/skip connection around. The reason why this cannot simply be a
Moduleis that currently PyTorch does not provide a reasonable way of cloning them.drop (Module, optional) – Dropout to be applied to the output of block before adding it to its input. Typically an instance of
DropoutorAlphaDropout. Defaults toDropout(p=0.0), resulting in no dropout being applied.norm_first – If
True, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. IfFalse, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults toTrue.bool – If
True, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. IfFalse, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults toTrue.optional – If
True, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. IfFalse, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults toTrue.norm_cls (type, optional) – The class of the norm to be applied after adding input to output, e.g.,
LayerNormorBatchNorm1d. Again, this is needed to easily create a fresh, new instances with equal, but independent parameters. Defaults toIdentity, resulting in no normalization whatsoever.*args – Arguments used to initialize an instance of norm_cls.
**kwargs – Keyword arguments used to initialize an instance of norm_cls.
- class Repeat(skip, n_layers=2)[source]
Bases:
BlockRepeat a skip-connection, distilling ever finer detail from your data.
- Parameters:
skip (SkipConnection) – An instance of a
SkipConnectionto repeat.n_layers (int, optional) – How often to repeat the skip. Defaults to 2.
Notes
If the skip-connection sets norm_first to
True, the final output of the last repetition will also be normalized (with a fresh instance of the exact same norm type used by the skip-connection).- forward(inp)[source]
Forward pass through a stack of identical skip-connection blocks.
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be
mod_dim.- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property layers
Range of layer indices.
Base classes
- class Resettable(*_, **__)[source]
Bases:
Module,ABCAbstract base class for Modules with a
reset_parametersmethod.
- class Block(*_, **__)[source]
Bases:
ResettableAbstract base class for stackable/repeatable neural-network components.
The input and output tensors of such components must have the same dimensions and sizes!