weighted
Combine feature embedding through weighted sums.
Depending on whether these weights are learnable themselves and on whether they depend on the input features or not, some type of feature importance can be provided.
- class ActivatedSumMixer(mod_dim, n_features, activate=<function identity>, **kwargs)[source]
Bases:
Resettable
Combine stacked feature vectors by a per-instance linear combination.
The per-instance coefficients sum to 1 for each data point and can thus be seen as some sort of per-instance feature importance. They are obtained by concatenating all features into a single, wide vector, linearly projecting down to a vector with the same number of elements as there are features to combine, optionally activating, and then applying a softmax.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
activate (Module or function, optional) – The activation function to be applied after (linearly) mixing the concatenated features. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults toidentity
, resulting in no non-linear activation whatsoever.**kwargs – Keyword arguments are passed on to the linear layer.
- forward(inp)[source]
Forward pass for combining multiple stacked feature vectors.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors.
- Returns:
The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the last dimension now contains the per-instance (normed) linear combination of all feature vectors.
- Return type:
Tensor
- importance(inp)[source]
Per-instance weights in the normed linear combination of features.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors.
- Returns:
The output tensor has one fewer dimensions than the input with the last dimension being dropped.
- Return type:
Tensor
- new(mod_dim=None, n_features=None, activate=None, **kwargs)[source]
Return a fresh instance with the same or updated parameters.
- Parameters:
mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to
None
.n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to
None
.activate (Module or function, optional) – The activation function to be applied after (linearly) mixing the concatenated features. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites the activate of the current instance if given. Defaults toNone
.**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.
- Returns:
A fresh, new instance of itself.
- Return type:
- class ConstantSumMixer(n_features)[source]
Bases:
Resettable
Combine stacked feature vectors by simply adding them.
The sum is then “normed” through dividing by the number of features.
- Parameters:
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
- forward(inp)[source]
Add stacked feature vectors with constant and equal weights.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension is expected to contain the features vectors themselves.
- Returns:
The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the last dimension now contains the (normed) sum of all feature vectors.
- Return type:
Tensor
- importance(inp)[source]
Constant feature weights in the normed sum over all features.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension is expected to contain the features vectors themselves.
- Returns:
The output tensor has one fewer dimensions than the input with the last dimension being dropped.
- Return type:
Tensor
- new(n_features=None)[source]
Return a fresh instance with the same or updated parameters.
- Parameters:
n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to
None
.- Returns:
A fresh, new instance of itself.
- Return type:
- class GatedResidualSumMixer(mod_dim, n_features, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), **kwargs)[source]
Bases:
Resettable
Combine stacked feature vectors by a per-instance linear combination.
The per-instance coefficients sum to 1 for each data point and can thus be seen as some sort of per-instance feature importance. They are obtained by concatenating all features into a single, wide vector, and then passing it through a Gated Residual Network (GRN), [1] such that the size of the output equals the number of features to combine.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
activate (Module or function, optional) – The activation function to be applied after (linear) transformation, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to anELU
activation.gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.drop (Module, optional) – Typically an instance of
Dropout
orAlphaDropout
. Defaults toDropout(p=0.0)
, resulting in no dropout being applied.**kwargs – Additional keyword arguments to pass through to the linear layers.
Note
This implementation is inspired by how features are combined in Temporal Fusion Transformers, [1] but it is not quite the same. Firstly, the inputs are not the raw feature embeddings but the final feature embeddings to be linearly combined. Secondly, the intermediate linear layer (Eq. 3) is eliminated and dropout is applied directly to the activations after the first layer. Finally, the layer norm (Eq. 2) is omitted because normalizing right before passing through a softmax seems unnecessary.
References
- forward(inp)[source]
Forward pass for combining multiple stacked feature vectors.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors.
- Returns:
The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the last dimension now contains the per-instance (normed) linear combination of all feature vectors.
- Return type:
Tensor
- importance(inp)[source]
Per-instance weights in the normed linear combination of features.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors.
- Returns:
The output tensor has one fewer dimensions than the input with the last dimension being dropped.
- Return type:
Tensor
- new(mod_dim=None, n_features=None, activate=None, gate=None, drop=None, **kwargs)[source]
Return a fresh instance with the same or updated parameters.
- Parameters:
mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to
None
.n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to
None
.activate (Module or function, optional) – The activation function to be applied after (linear) transform, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites activate of the current instance if given. Defaults toNone
.gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
. Overwrites the gate of the current instance if given. Defaults toNone
.drop (Module, optional) – Typically an instance of
Dropout
orAlphaDropout
. Overwrites the drop of the current instance if given. Defaults toNone
.**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layers together.
- Returns:
A fresh, new instance of itself.
- Return type:
- class GatedSumMixer(mod_dim, n_features, gate=Sigmoid(), **kwargs)[source]
Bases:
Resettable
Combine stacked feature vectors by a per-instance linear combination.
The per-instance coefficients sum to 1 for each data point and can thus be seen as some sort of per-instance feature importance. They are obtained by concatenating all features into a single, wide vector, and linearly projecting it down to a vector with twice as many elements as there are features. One half is then passed through an (optional) activation function to gate the other half, thus reducing the final output back down to the number of features to combine.
- Parameters:
mod_dim (int) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension.
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed inputs before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
. Defaults to a sigmoid.**kwargs – Keyword arguments are passed on to the linear layer.
- forward(inp)[source]
Forward pass for combining multiple stacked feature vectors.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors.
- Returns:
The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the last dimension now contains the per-instance (normed) linear combination of all feature vectors.
- Return type:
Tensor
- importance(inp)[source]
Per-instance weights in the normed linear combination of features.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension (of size mod_dim) is expected to contain the features vectors.
- Returns:
The output tensor has one fewer dimensions than the input with the last dimension being dropped.
- Return type:
Tensor
- new(mod_dim=None, n_features=None, gate=None, **kwargs)[source]
Return a fresh instance with the same or updated parameters.
- Parameters:
mod_dim (int, optional) – Size of the feature space. The input tensor is expected to be of that size in its last dimension and the output will again have this size in its last dimension. Overwrites the mod_dim of the current instance if given. Defaults to
None
.n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to
None
.gate (Module or function, optional) – The activation function to be applied to half of the (linearly) transformed input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from
torch.nn
or a function fromtorch.nn.functional
. Overwrites the gate of the current instance if given. Defaults toNone
.**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.
- Returns:
A fresh, new instance of itself.
- Return type:
- class VariableSumMixer(n_features)[source]
Bases:
Resettable
Combine stacked feature vectors through a learnable linear combination.
Specifically, a single, global set of linear-combination coefficients is learned. These coefficients sum to 1 and can thus be seen as some sort of feature importance.
- Parameters:
n_features (int) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor.
- forward(inp)[source]
Linearly combine stacked feature vectors with global coefficients.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension is expected to contain the features vectors themselves.
- Returns:
The output tensor has one fewer dimensions than the input. The next-to-last dimension is dropped and the last dimension now contains the (normed) linear combination of all feature vectors.
- Return type:
Tensor
- importance(inp)[source]
Learned, global feature weights in the normed sum over all features.
- Parameters:
inp (Tensor) – Feature vectors stacked into a tensor of at least 2 dimensions. The size of the next-to-last last dimension is expected to match the n_features provided at instantiation. The last dimension is expected to contain the features vectors themselves.
- Returns:
The output tensor has one fewer dimensions than the input with the last dimension being dropped.
- Return type:
Tensor
- new(n_features=None)[source]
Return a fresh instance with the same or updated parameters.
- Parameters:
n_features (int, optional) – The number of features to combine. Must be equal to the size of the next-to-last dimension of the input tensor. Overwrites n_features of the current instance if given. Defaults to
None
.- Returns:
A fresh, new instance of itself.
- Return type: