blocks
Flexible and composable building blocks for constructing neural-networks.
After features are embedded and combined, it is time to extract as much information as possible to predict the desired target. One way of doing this systematically is to repeat layers of identical internal architecture with residual (or skip) connections between them.
- class IdentityBlock(mod_dim, *_, **__)[source]
Bases:
BlockPyTorch module that passes a tensor right through, doing nothing.
This is a placeholder for instances where a default
Moduleis required that not only has areset_parameters()method, but also anew()method in addition tomod_dim,deviceanddtype. Providing any number of (keyword) arguments on instantiation is permitted, but they are ignored.- Parameters:
mod_dim (int) – Ignored but mandatory to maintain API compatibility.
- property device
Just for API compatibility. Always returns None.
- property dtype
Just for API compatibility. Always returns None.
- forward(tensor, *_, **__)[source]
Pass through first argument, ignore additional (keyword) arguments.
- Parameters:
tensor (Tensor) – Any argument (typically a tensor) to be passed straight through.
- Returns:
The tensor passed in as argument.
- Return type:
Tensor
- property mod_dim
The model dimension.
- class ActivatedBlock(mod_dim, activate=ELU(alpha=1.0), bias=True, device='cpu', dtype=torch.float32)[source]
Bases:
BlockA single, non-linearly activated layer.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after projecting into higher- dimensional space. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults toELU().bias (bool, optional) – Whether to add a learnable bias vector in to the projections. Defaults to
True.device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.
dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to
torch.float.
- property device
The device all weights, biases, activations, etc. reside on.
- property dtype
The dtype of all weights, biases, activations, and parameters.
- forward(inp)[source]
Forward pass through a single, non-linearly activated hidden layer.
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be mod_dim.
- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property mod_dim
The model dimension.
- class ActivatedHiddenBlock(mod_dim, activate=ELU(alpha=1.0), factor=4, bias=True, device='cpu', dtype=torch.float32)[source]
Bases:
BlockA single, non-linearly activated hidden layer of configurable size.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after projecting into higher- dimensional space. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults toELU().factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.
bias (bool, optional) – Whether to add a learnable bias vector in to the projections. Defaults to
True.device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.
dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to
torch.float.
- property device
The device all weights, biases, activations, etc. reside on.
- property dtype
The dtype of all weights, biases, activations, and parameters.
- forward(inp)[source]
Forward pass through a single, non-linearly activated hidden layer.
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be mod_dim.
- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property mod_dim
The model dimension.
- class GatedBlock(mod_dim, gate=Sigmoid(), bias=True, device='cpu', dtype=torch.float32)[source]
Bases:
BlockA configurable, gated linear unit (GLU).
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.bias (bool, optional) – Whether to add a learnable bias vector in to the projections. Defaults to
True.device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.
dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to
torch.float.
- property device
The device all weights, biases, activations, etc. reside on.
- property dtype
The dtype of all weights, biases, activations, and parameters.
- forward(inp)[source]
Forward pass through a single gated linear unit (GLU).
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be mod_dim.
- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property mod_dim
The model dimension.
- class GatedHiddenBlock(mod_dim, gate=Sigmoid(), factor=4, bias=True, device='cpu', dtype=torch.float32)[source]
Bases:
BlockA configurable, gated linear unit (GLU) with single hidden layer.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.factor (int, optional) – The size of the hidden layer is this integer factor times mod_dim. Defaults to 4.
bias (bool, optional) – Whether to add a learnable bias vector in to the projections. Defaults to
True.device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.
dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to
torch.float.
- property device
The device all weights, biases, activations, etc. reside on.
- property dim
The hidden dimension after gating.
- property dtype
The dtype of all weights, biases, activations, and parameters.
- forward(inp)[source]
Forward pass through a gated linear unit (GLU) with a hidden layer.
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be mod_dim.
- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property mod_dim
The model dimension.
- class GatedActivatedBlock(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), bias=True, device='cpu', dtype=torch.float32)[source]
Bases:
BlockGated Residual Network (GRN) for efficiently extracting information.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
activate (Module or function, optional) – The activation function to be applied after (linear) projection, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to anELUactivation.gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nnor a function fromtorch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.bias (bool, optional) – Whether to add a learnable bias vector in the projections. Defaults to
True.device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.
dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to
torch.float.
Note
Inspired by Gated Residual Network (GRN) introduced in [1], this module (linearly) projects the input, applies a non-linearity, and gates the result by a sigmoid activation of a projection of the same intermediate representation, giving the model per-dimension control over how much non-linearity contributes to the output.
References
- property device
The device all weights, biases, activations, etc. reside on.
- property dtype
The dtype of all weights, biases, activations, and parameters.
- forward(inp)[source]
Forward pass through a single gated residual network (GRN).
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be
mod_dim.- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property mod_dim
The model dimension.
- class SkipConnection(block, dropout=0.0, norm_first=True, norm_cls=<class 'swak.pt.misc.identity.Identity'>, *args, device='cpu', dtype=torch.float32, **kwargs)[source]
Bases:
BlockAdd a residual/skip connection around the wrapped neural-network block.
- Parameters:
block (Block) – The block to wrap the residual/skip connection around. The reason why this cannot simply be a
Moduleis that currently PyTorch does not provide a reasonable way of cloning them.dropout (float, optional) – The amount of dropout to apply to the block’s output before adding it back to the activated residual. Defaults to 0.
norm_first (bool, optional) – If
True, normalize the inputs before passing them through the block and adding the outputs to the raw inputs. IfFalse, pass inputs through the block first and normalize the sum of the inputs and outputs afterward. Defaults toTrue.norm_cls (type, optional) – The class of the norm to be applied after adding input to output, e.g.,
LayerNormorBatchNorm1d. Again, this is needed to easily create a fresh, new instances with equal, but independent parameters. Defaults toIdentity, resulting in no normalization.*args – Arguments used to initialize an instance of norm_cls.
device (str or torch.device, optional) – Torch device to first create the block on. Defaults to “cpu”.
dtype (torch.dtype, optional) – Torch dtype to first create the block in. Defaults to
torch.float.**kwargs – Keyword arguments used to initialize an instance of norm_cls.
See also
- property device
The device all weights, biases, activations, etc. reside on.
- property dtype
The dtype of all weights, biases, activations, and parameters.
- forward(inp)[source]
Forward pass through a block with the input added to the output.
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be mod_dim.
- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property mod_dim
The model dimension.
- class Repeat(skip, n_layers=2, device='cpu', dtype=torch.float32)[source]
Bases:
BlockRepeat a skip-connection, distilling ever finer detail from your data.
- Parameters:
skip (SkipConnection) – An instance of a
SkipConnectionto repeat.n_layers (int, optional) – How often to repeat the skip. Defaults to 2.
device (str or torch.device, optional) – Torch device to first create the blocks on. Defaults to “cpu”.
dtype (torch.dtype, optional) – Torch dtype to first create the blocks in. Defaults to
torch.float.
- Raises:
TypeError – If n_layers is not an integer.
ValueError – If n_layers is smaller than 1.
Note
If the skip-connection sets norm_first to
True, no norm will be applied to the final output of the last repetition. If a trailing norm is desired, it should be applied externally, after this module.See also
- property device
The device all weights, biases, activations, etc. reside on.
- property dtype
The dtype of all weights, biases, activations, and parameters.
- forward(inp)[source]
Forward pass through a stack of identical skip-connection blocks.
- Parameters:
inp (Tensor) – The size of the last dimension is expected to be mod_dim.
- Returns:
Same dimensions and sizes as the input tensor.
- Return type:
Tensor
- property layers
Range of layer indices.
- property mod_dim
The model dimension.
Base classes
- class Resettable(*_, **__)[source]
Bases:
Module,ABCAbstract base class for Modules with a
reset_parameters().
- class Block(*_, **__)[source]
Bases:
ResettableAbstract base class for neural-network components.
- abstract property device
Return the device that parameters/weights live on.
- abstract property dtype
Return the dtype of parameters/weight.
- abstract property mod_dim
Return the embedding dimension of the module.