embed

Flexibly project your features into embedding space.

The first step in many modern neural-network architectures is to transform input features into vectors in an embedding space with a certain number of dimensions, the “model dimension” (or overall “bus width” of the model). This subpackage provides several ways to do that for both numerical and categorical features so that, when combined, all are treated on equal footing.

class ActivatedEmbedder(mod_dim, activate=<function identity>, inp_dim=1, **kwargs)[source]

Bases: Resettable

Simple linear projection of an individual feature into embedding space.

Parameters:

mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.
activate (Module or function, optional) – The activation function to be applied after (linear) projection into embedding space. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to identity, resulting in no non-linear activation whatsoever.
inp_dim (int, optional) – The number of features to embed together. Defaults to 1.
**kwargs – Additional keyword arguments to pass through to the linear layer.

forward(inp)[source]

Embed a single numerical feature through a (non-)linear projection.

Parameters:: inp (Tensor) – The last dimension of the input tensor is typically expected to be of size 1 and to contain the numerical value of a single feature. In case inp_dim dim was explicitly set to a value > 1 on instantiation, the size of the last dimension must match inp_dim, the number of numerical features to embed together.
Returns:: The output has the same number of dimensions as the input with the size of the last dimension changed to the specified mod_dim.
Return type:: Tensor

new(mod_dim=None, activate=None, inp_dim=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:

mod_dim (int, optional) – Desired embedding size. Will become the size of the last dimension of the output tensor. Overwrites the mod_dim of the current instance if given. Defaults to None.
activate (Module or function, optional) – The activation function to be applied after (linear) projection into embedding space. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites the activate of the current instance if given. Defaults to None.
inp_dim (int, optional) – The number of features to embed together. Overwrites the inp_dim of the current instance if given. Defaults to None.
**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.

Returns:

A fresh, new instance of itself.

Return type:

ActivatedEmbedder

reset_parameters()[source]: Re-initialize all internal parameters.

class GatedEmbedder(mod_dim, gate=Sigmoid(), inp_dim=1, **kwargs)[source]

Bases: Resettable

Flexible Gated Linear Unit (GLU) for embedding a numerical feature.

Parameters:

mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.
inp_dim (int, optional) – The number of features to embed together. Defaults to 1.
**kwargs – Additional keyword arguments to pass through to the linear layer.

forward(inp)[source]

Embed a single numerical feature through a Gated Linear Unit (GLU).

Parameters:: inp (Tensor) – The last dimension of the input tensor is typically expected to be of size 1 and to contain the numerical value of a single feature. In case inp_dim dim was explicitly set to a value > 1 on instantiation, the size of the last dimension must match inp_dim, the number of numerical features to embed together.
Returns:: The output has the same number of dimensions as the input with the size of the last dimension changed to the specified mod_dim.
Return type:: Tensor

new(mod_dim=None, gate=None, inp_dim=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:

mod_dim (int, optional) – Desired embedding size. Will become the size of the last dimension of the output tensor. Overwrites the mod_dim of the current instance if given. Defaults to None.
gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional. Overwrites the gate of the current instance if given. Defaults to None.
inp_dim (int, optional) – The number of features to embed together. Overwrites the inp_dim of the current instance if given. Defaults to None.
**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.

Returns:

A fresh, new instance of itself.

Return type:

GatedEmbedder

reset_parameters()[source]: Re-initialize all internal parameters.

class GatedResidualEmbedder(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), inp_dim=1, **kwargs)[source]

Bases: Resettable

Gated Residual Network (GRN) for embedding numerical features.

Parameters:

mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.
activate (Module or function, optional) – The activation function to be applied after (linear) projection into embedding space, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to an ELU activation.
gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.
drop (Module, optional) – Typically an instance of Dropout or AlphaDropout. Defaults to Dropout(p=0.0), resulting in no dropout being applied.
inp_dim (int, optional) – The number of features to embed together. Defaults to 1.
**kwargs – Additional keyword arguments to pass through to the linear layers.

Note

This implementation is inspired by how features are encoded in Temporal Fusion Transformers, [1] but it is not quite the same. Firstly, the (linear) projection of scalar numerical features into embedding space happens inside the present module. Secondly, this embedding vector is not transformed again (as Eq. 4 seems to imply) and there is no option to add a context vector. Thirdly, the intermediate linear layer (Eq. 3) is eliminated and dropout is applied directly to the activations after the first layer. Finally, the layer norm (Eq. 2) is replaced by simply dividing the sum of (linearly projected) input and gated signal by 2. Should additional normalization be desired, it can be performed independently on the output of this module.

References

forward(inp)[source]

Embed a numerical feature through a Gated Residual Network (GRN).

Parameters:: inp (Tensor) – The last dimension of the input tensor is typically expected to be of size 1 and to contain the numerical value of a single feature. In case inp_dim dim was explicitly set to a value > 1 on instantiation, the size of the last dimension must match inp_dim, the number of numerical features to embed together.
Returns:: The output has the same number of dimensions as the input with the size of the last dimension changed to the specified mod_dim.
Return type:: Tensor

new(mod_dim=None, activate=None, gate=None, drop=None, inp_dim=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:

mod_dim (int, optional) – Desired embedding size. Will become the size of the last dimension of the output tensor. Overwrites the mod_dim of the current instance if given. Defaults to None.
activate (Module or function, optional) – The activation function to be applied after (linear) projection into embedding space, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites the activate of the current instance if given. Defaults to None.
gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional. Overwrites the gate of the current instance if given. Defaults to None.
drop (Module, optional) – Typically an instance of Dropout or AlphaDropout. Overwrites the drop of the current instance if given. Defaults to None.
inp_dim (int, optional) – The number of features to embed together. Overwrites the inp_dim of the current instance if given. Defaults to None.
**kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layers together.

Returns:

A fresh, new instance of itself.

Return type:

GatedResidualEmbedder

reset_parameters()[source]: Re-initialize all internal parameters.

class NumericalEmbedder(mod_dim, n_features, emb_cls, *args, **kwargs)[source]

Bases: Resettable

Transform (scalar) numerical features into embedding vectors.

Parameters:

mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.
n_features (int) – Number of features to embed, which must equal the size of the last dimension of the input tensor.
emb_cls (type) – The PyTorch module to use as embedding class. Must take mod_dim as its first argument on instantiation, take tensors of size 1 in their last dimension, and change that dimension to size mod_dim.
**args – Additional arguments to use when instantiating emb_cls.
**kwargs – Additional keyword arguments to use when instantiating emb_cls.