mix
Subpackages
Combine your embedded features into vectors of size model-dimension.
Two ways of doing this are provided here. One is to form a (weighted) sum of the feature embeddings. When these weights are learnable themselves, they can be interpreted as feature importance. The other is to concatenate the embedding vectors of all features into a single, wide vector and to then project it down again into a space with the same dimensions of the embedding space.
- class ActivatedConcatMixer(mod_dim, n_features, activate=<function identity>, **kwargs)[source]
Bases:
Resettable
Combined stacked feature vectors through a single dense layer.
Multiple feature vectors stacked in a single tensor are concatenated into a single, wide vector and projected down into a space the size of the model.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
activate (Module or function, optional) – The activation function to be applied after linearly combining features. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults toidentity
, resulting in no non-linear activation whatsoever.**kwargs – Keyword arguments are passed on to the linear layer.
- forward(inp)[source]
Forward pass for combining multiple stacked feature vectors.
- Parameters:
inp (Tensor) – The size of the next-to-last last dimension of the input tensor is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors themselves.
- Returns:
The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the size of the last dimension is once again mod_dim.
- Return type:
Tensor
- new(mod_dim=None, n_features=None, activate=None, **kwargs)[source]
Return a fresh instance with the same or updated parameters.
- Parameters:
mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to
None
.n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to
None
.activate (Module or function, optional) – The activation function to be applied after linearly combining features. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites the activate of the current instance if given. Defaults toNone
.**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.
- Returns:
A fresh, new instance of itself.
- Return type:
- class GatedConcatMixer(mod_dim, n_features, gate=Sigmoid(), **kwargs)[source]
Bases:
Resettable
Combined stacked feature vectors through a Gated Linear Unit (GLU).
Features are concatenated into a single, wide vector and projected into a space twice the size of the model. One half is then passed through an (optional) activation function to gate the other half, thus reducing the final output back down to the model size.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed inputs before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
. Defaults to a sigmoid.**kwargs – Keyword arguments are passed on to the linear layer.
- forward(inp)[source]
Forward pass for combining multiple stacked feature vectors.
- Parameters:
inp (Tensor) – The size of the next-to-last last dimension of the input tensor is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors themselves.
- Returns:
The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the size of the last dimension is once again mod_dim.
- Return type:
Tensor
- new(mod_dim=None, n_features=None, gate=None, **kwargs)[source]
Return a fresh instance with the same or updated parameters.
- Parameters:
mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to
None
.n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to
None
.gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
. Overwrites the gate of the current instance if given. Defaults toNone
.**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.
- Returns:
A fresh, new instance of itself.
- Return type:
- class GatedResidualConcatMixer(mod_dim, n_features, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), **kwargs)[source]
Bases:
Resettable
Combined stacked feature vectors through a Gated Residual Network (GRN).
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
activate (Module or function, optional) – The activation function to be applied after (linear) transformation, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to anELU
activation.gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.drop (Module, optional) – Typically an instance of
Dropout
orAlphaDropout
. Defaults toDropout(p=0.0)
, resulting in no dropout being applied.**kwargs – Additional keyword arguments to pass through to the linear layers.
Note
This implementation is inspired by how features are encoded in Temporal Fusion Transformers, [1] but it is not quite the same. Specifically, the intermediate linear layer (Eq. 3) is eliminated and dropout is applied directly to the activations after the first layer. Also, the layer norm (Eq. 2) is replaced by simply dividing the sum of (linearly projected) input and gated signal by 2. Should additional normalization be desired, it can be performed independently on the output of this module.
References
- forward(inp)[source]
Forward pass for combining multiple stacked feature vectors.
- Parameters:
inp (Tensor) – The size of the next-to-last last dimension of the input tensor is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors themselves.
- Returns:
The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the size of the last dimension is once again mod_dim.
- Return type:
Tensor
- new(mod_dim=None, n_features=None, activate=None, gate=None, drop=None, **kwargs)[source]
Return a fresh instance with the same or updated parameters.
- Parameters:
mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to
None
.n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to
None
.activate (Module or function, optional) – The activation function to be applied after (linear) transform, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites activate of the current instance if given. Defaults toNone
.gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
. Overwrites the gate of the current instance if given. Defaults toNone
.drop (Module, optional) – Typically an instance of
Dropout
orAlphaDropout
. Overwrites the drop of the current instance if given. Defaults toNone
.**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layers together.
- Returns:
A fresh, new instance of itself.
- Return type: