mix

Subpackages

weighted

Combine your embedded features into vectors of size model-dimension.

Two ways of doing this are provided here. One is to form a (weighted) sum of the feature embeddings. When these weights are learnable themselves, they can be interpreted as feature importance. The other is to concatenate the embedding vectors of all features into a single, wide vector and to then project it down again into a space with the same dimensions of the embedding space.

class ActivatedConcatMixer(mod_dim, n_features, activate=<function identity>, **kwargs)[source]

Bases: Resettable

Combined stacked feature vectors through a single dense layer.

Multiple feature vectors stacked in a single tensor are concatenated into a single, wide vector and projected down into a space the size of the model.

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
activate (Module or function, optional) – The activation function to be applied after linearly combining features. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to identity, resulting in no non-linear activation whatsoever.
**kwargs – Keyword arguments are passed on to the linear layer.

forward(inp)[source]

Forward pass for combining multiple stacked feature vectors.

Parameters:: inp (Tensor) – The size of the next-to-last last dimension of the input tensor is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors themselves.
Returns:: The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the size of the last dimension is once again mod_dim.
Return type:: Tensor

new(mod_dim=None, n_features=None, activate=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:

mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to None.
n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to None.
activate (Module or function, optional) – The activation function to be applied after linearly combining features. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites the activate of the current instance if given. Defaults to None.
**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.

Returns:

A fresh, new instance of itself.

Return type:

ActivatedConcatMixer

reset_parameters()[source]: Re-initialize all internal parameters.

class GatedConcatMixer(mod_dim, n_features, gate=Sigmoid(), **kwargs)[source]

Bases: Resettable

Combined stacked feature vectors through a Gated Linear Unit (GLU).

Features are concatenated into a single, wide vector and projected into a space twice the size of the model. One half is then passed through an (optional) activation function to gate the other half, thus reducing the final output back down to the model size.

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed inputs before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional. Defaults to a sigmoid.
**kwargs – Keyword arguments are passed on to the linear layer.

forward(inp)[source]

Forward pass for combining multiple stacked feature vectors.

Parameters:: inp (Tensor) – The size of the next-to-last last dimension of the input tensor is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors themselves.
Returns:: The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the size of the last dimension is once again mod_dim.
Return type:: Tensor

new(mod_dim=None, n_features=None, gate=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:

mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to None.
n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to None.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional. Overwrites the gate of the current instance if given. Defaults to None.
**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.

Returns:

A fresh, new instance of itself.

Return type:

GatedConcatMixer

reset_parameters()[source]: Re-initialize all internal parameters.

class GatedResidualConcatMixer(mod_dim, n_features, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), **kwargs)[source]

Bases: Resettable

Combined stacked feature vectors through a Gated Residual Network (GRN).

Parameters:

mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
activate (Module or function, optional) – The activation function to be applied after (linear) transformation, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to an ELU activation.
gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.
drop (Module, optional) – Typically an instance of Dropout or AlphaDropout. Defaults to Dropout(p=0.0), resulting in no dropout being applied.
**kwargs – Additional keyword arguments to pass through to the linear layers.

Note

This implementation is inspired by how features are encoded in Temporal Fusion Transformers, [1] but it is not quite the same. Specifically, the intermediate linear layer (Eq. 3) is eliminated and dropout is applied directly to the activations after the first layer. Also, the layer norm (Eq. 2) is replaced by simply dividing the sum of (linearly projected) input and gated signal by 2. Should additional normalization be desired, it can be performed independently on the output of this module.

References

forward(inp)[source]

Forward pass for combining multiple stacked feature vectors.

Parameters:: inp (Tensor) – The size of the next-to-last last dimension of the input tensor is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors themselves.
Returns:: The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the size of the last dimension is once again mod_dim.
Return type:: Tensor

new(mod_dim=None, n_features=None, activate=None, gate=None, drop=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:

mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to None.
n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to None.
activate (Module or function, optional) – The activation function to be applied after (linear) transform, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites activate of the current instance if given. Defaults to None.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional. Overwrites the gate of the current instance if given. Defaults to None.
drop (Module, optional) – Typically an instance of Dropout or AlphaDropout. Overwrites the drop of the current instance if given. Defaults to None.
**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layers together.

Returns:

A fresh, new instance of itself.

Return type:

GatedResidualConcatMixer

reset_parameters()[source]: Re-initialize all internal parameters.