embed

Flexibly project your features into embedding space.

The first step in many modern neural-network architectures is to transform input features into vectors in an embedding space with a certain number of dimensions, the “model dimension” (or overall “bus width” of the model). This subpackage provides several ways to do that for both numerical and categorical features so that, when combined, all are treated on equal footing.

class ActivatedEmbedder(mod_dim, activate=<function identity>, inp_dim=1, **kwargs)[source]

Bases: Resettable

Simple linear projection of an individual feature into embedding space.

Parameters:
  • mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.

  • activate (Module or function, optional) – The activation function to be applied after (linear) projection into embedding space. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to identity, resulting in no non-linear activation whatsoever.

  • inp_dim (int, optional) – The number of features to embed together. Defaults to 1.

  • **kwargs – Additional keyword arguments to pass through to the linear layer.

forward(inp)[source]

Embed a single numerical feature through a (non-)linear projection.

Parameters:

inp (Tensor) – The last dimension of the input tensor is typically expected to be of size 1 and to contain the numerical value of a single feature. In case inp_dim dim was explicitly set to a value > 1 on instantiation, the size of the last dimension must match inp_dim, the number of numerical features to embed together.

Returns:

The output has the same number of dimensions as the input with the size of the last dimension changed to the specified mod_dim.

Return type:

Tensor

new(mod_dim=None, activate=None, inp_dim=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:
  • mod_dim (int, optional) – Desired embedding size. Will become the size of the last dimension of the output tensor. Overwrites the mod_dim of the current instance if given. Defaults to None.

  • activate (Module or function, optional) – The activation function to be applied after (linear) projection into embedding space. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites the activate of the current instance if given. Defaults to None.

  • inp_dim (int, optional) – The number of features to embed together. Overwrites the inp_dim of the current instance if given. Defaults to None.

  • **kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.

Returns:

A fresh, new instance of itself.

Return type:

ActivatedEmbedder

reset_parameters()[source]

Re-initialize all internal parameters.

class GatedEmbedder(mod_dim, gate=Sigmoid(), inp_dim=1, **kwargs)[source]

Bases: Resettable

Flexible Gated Linear Unit (GLU) for embedding a numerical feature.

Parameters:
  • mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.

  • gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.

  • inp_dim (int, optional) – The number of features to embed together. Defaults to 1.

  • **kwargs – Additional keyword arguments to pass through to the linear layer.

forward(inp)[source]

Embed a single numerical feature through a Gated Linear Unit (GLU).

Parameters:

inp (Tensor) – The last dimension of the input tensor is typically expected to be of size 1 and to contain the numerical value of a single feature. In case inp_dim dim was explicitly set to a value > 1 on instantiation, the size of the last dimension must match inp_dim, the number of numerical features to embed together.

Returns:

The output has the same number of dimensions as the input with the size of the last dimension changed to the specified mod_dim.

Return type:

Tensor

new(mod_dim=None, gate=None, inp_dim=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:
  • mod_dim (int, optional) – Desired embedding size. Will become the size of the last dimension of the output tensor. Overwrites the mod_dim of the current instance if given. Defaults to None.

  • gate (Module or function, optional) – The activation function to be applied to half of the (linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional. Overwrites the gate of the current instance if given. Defaults to None.

  • inp_dim (int, optional) – The number of features to embed together. Overwrites the inp_dim of the current instance if given. Defaults to None.

  • **kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layer together.

Returns:

A fresh, new instance of itself.

Return type:

GatedEmbedder

reset_parameters()[source]

Re-initialize all internal parameters.

class GatedResidualEmbedder(mod_dim, activate=ELU(alpha=1.0), gate=Sigmoid(), drop=Dropout(p=0.0, inplace=False), inp_dim=1, **kwargs)[source]

Bases: Resettable

Gated Residual Network (GRN) for embedding numerical features.

Parameters:
  • mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.

  • activate (Module or function, optional) – The activation function to be applied after (linear) projection into embedding space, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Defaults to an ELU activation.

  • gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional, depending on whether it needs to be further parameterized or not. Defaults to a sigmoid.

  • drop (Module, optional) – Typically an instance of Dropout or AlphaDropout. Defaults to Dropout(p=0.0), resulting in no dropout being applied.

  • inp_dim (int, optional) – The number of features to embed together. Defaults to 1.

  • **kwargs – Additional keyword arguments to pass through to the linear layers.

Note

This implementation is inspired by how features are encoded in Temporal Fusion Transformers, [1] but it is not quite the same. Firstly, the (linear) projection of scalar numerical features into embedding space happens inside the present module. Secondly, this embedding vector is not transformed again (as Eq. 4 seems to imply) and there is no option to add a context vector. Thirdly, the intermediate linear layer (Eq. 3) is eliminated and dropout is applied directly to the activations after the first layer. Finally, the layer norm (Eq. 2) is replaced by simply dividing the sum of (linearly projected) input and gated signal by 2. Should additional normalization be desired, it can be performed independently on the output of this module.

References

forward(inp)[source]

Embed a numerical feature through a Gated Residual Network (GRN).

Parameters:

inp (Tensor) – The last dimension of the input tensor is typically expected to be of size 1 and to contain the numerical value of a single feature. In case inp_dim dim was explicitly set to a value > 1 on instantiation, the size of the last dimension must match inp_dim, the number of numerical features to embed together.

Returns:

The output has the same number of dimensions as the input with the size of the last dimension changed to the specified mod_dim.

Return type:

Tensor

new(mod_dim=None, activate=None, gate=None, drop=None, inp_dim=None, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:
  • mod_dim (int, optional) – Desired embedding size. Will become the size of the last dimension of the output tensor. Overwrites the mod_dim of the current instance if given. Defaults to None.

  • activate (Module or function, optional) – The activation function to be applied after (linear) projection into embedding space, but prior to gating. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional`, depending on whether it needs to be further parameterized or not. Overwrites the activate of the current instance if given. Defaults to None.

  • gate (Module or function, optional) – The activation function to be applied to half of the (non-linearly) projected input before multiplying with the other half. Must be a callable that accepts a tensor as sole argument, like a module from torch.nn or a function from torch.nn.functional. Overwrites the gate of the current instance if given. Defaults to None.

  • drop (Module, optional) – Typically an instance of Dropout or AlphaDropout. Overwrites the drop of the current instance if given. Defaults to None.

  • inp_dim (int, optional) – The number of features to embed together. Overwrites the inp_dim of the current instance if given. Defaults to None.

  • **kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then passed through to the linear layers together.

Returns:

A fresh, new instance of itself.

Return type:

GatedResidualEmbedder

reset_parameters()[source]

Re-initialize all internal parameters.

class NumericalEmbedder(mod_dim, n_features, emb_cls, *args, **kwargs)[source]

Bases: Resettable

Transform (scalar) numerical features into embedding vectors.

Parameters:
  • mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.

  • n_features (int) – Number of features to embed, which must equal the size of the last dimension of the input tensor.

  • emb_cls (type) – The PyTorch module to use as embedding class. Must take mod_dim as its first argument on instantiation, take tensors of size 1 in their last dimension, and change that dimension to size mod_dim.

  • **args – Additional arguments to use when instantiating emb_cls.

  • **kwargs – Additional keyword arguments to use when instantiating emb_cls.

property dim

The output tensor dimension index to stack features into.

property features

Range of feature indices.

forward(inp)[source]

Forward pass for embedding scalar numerical features into vectors.

Parameters:

inp (Tensor) – The last dimension of the input tensor is expected to be of size n_features and to contain the scalar values of the individual numerical features.

Returns:

The output tensor has one more dimension of size mod_dim added after the last dimension (of size n_features) than the inp, containing the stacked embeddings.

Return type:

Tensor

new(mod_dim=None, n_features=None, emb_cls=None, *args, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:
  • mod_dim (int, optional) – Desired embedding size. Will become the size of the last dimension the output tensor. Overwrites the mod_dim of the current instance if given. Defaults to None.

  • n_features (int, optional) – Number of features to embed, which must equal the size of the last dimension of the input tensor. Overwrites the n_features of the current instance if given. Defaults to None.

  • emb_cls (type, optional) – The PyTorch module to use as embedding class. Must take mod_dim as its first argument on instantiation. Overwrites the emb_cls of the current instance if given. Defaults to None.

  • *args – Additional arguments replace those of the current instance and are then used when instantiating emb_cls.

  • **kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then used together when instantiating emb_cls.

Returns:

A fresh, new instance of itself.

Return type:

NumericalEmbedder

reset_parameters()[source]

Re-initialize all internal parameters.

class CategoricalEmbedder(mod_dim, cat_count=(), *cat_counts, **kwargs)[source]

Bases: Resettable

Embed one or more categorical features as numerical vectors.

Parameters:
  • mod_dim (int) – Desired embedding size. Will become the size of the last dimension of the output tensor.

  • cat_count (int or iterable of int, optional) – One integer or an iterable (e.g., a tuple or list) of integers, each specifying the total number of categories in the respective feature. Defaults to an emtpy tuple.

  • *cat_counts (int) – Category counts for additional features. Together with the cat_count, the total number of category counts, i.e., the total number of features to embed must match the size of the last dimension of the input tensor.

  • **kwargs – Keyword arguments are forwarded to the PyTorch Embedding class.

Note

The integer numbers identifying a category are expected to be zero-base, i.e., if the category count of a feature is 3, the allowed category identifier are 0, 1, and 2. If you need a padding index (e.g., to mark missing/unknown values), do not forget to increase all cat_counts by one!

property dim

The output tensor dimension index to stack features into.

property features

Range of feature indices.

forward(inp)[source]

Forward pass for embedding categorical features into vectors.

Parameters:

inp (tensor) – Input tensor must be of dtype long. The size of the last dimension is expected to match the number of specified cat_counts and to contain the integer identifiers of the categories in the respective feature. These identifiers must all be lower in value than their respective count.

Returns:

The output tensor has one more dimension of size mod_dim added after the last dimension (with a size equal to the number of cat_counts) than the inp, containing the stacked embeddings.

Return type:

Tensor

property n_features

Number of features to embed.

new(mod_dim=None, cat_count=None, *cat_counts, **kwargs)[source]

Return a fresh instance with the same or updated parameters.

Parameters:
  • mod_dim (int, optional) – Desired embedding size. Will become the size of the last dimension of the output tensor. Overwrites the mod_dim of the current instance if given. Defaults to None.

  • cat_count (int or iterable of int, optional) – One integer or an iterable (e.g., tuple or list) of integers, each specifying the number of categories in the respective feature. Overwrites the cat_count of the current instance if given. Defaults to None.

  • *cat_counts (int) – Category counts for additional features. Together with the cat_count, the total number of category counts must match the size of the last dimension of the input tensor.

  • **kwargs – Additional keyword arguments are merged into the keyword arguments of the current instance and are then used together for instantiating the PyTorch Embedding class.

Returns:

A fresh, new instance of itself.

Return type:

CategoricalEmbedder

reset_parameters()[source]

Re-initialize all internal parameters.

class FeatureEmbedder(embed_num, embed_cat)[source]

Bases: Resettable

Jointly embed numerical and categorical features into stacked vectors.

Given a float tensor where both, numerical and categorical features appear (one before the other in the last dimension), instances of this class treat them on equal footing and produce stacked embedding vectors for all of them.

Parameters:
Raises:

EmbeddingError – If the embedding dimension of the numerical and the categorical embedders do not match.

forward(inp)[source]

Forward pass for numerical and categorical feature embeddings.

Parameters:

inp (Tensor) – Input tensor of must be of dtype float. The last dimension is expected to contain first the values of all numerical features, followed by those of the categorical features.

Returns:

The output tensor has one more dimension of size mod_dim added after the last dimension (with a size equal to the total number of features) than the inp, containing the stacked embeddings, first those of the numerical and then those of the categorical features.

Return type:

Tensor

property n_cat

Number of categorical features.

property n_features

Total number of features.

property n_num

Number of numerical features.

new(embed_num=None, embed_cat=None)[source]

Return a fresh instance with the same or updated parameters.

Parameters:
  • embed_num (NumericalEmbedder, optional) – Overwrites the embed_num of the current instance if given. Defaults to None.

  • embed_cat (CategoricalEmbedder, optional) – Overwrites the embed_cat of the current instance if given. Defaults to None.

Returns:

A fresh, new instance of itself.

Return type:

FeatureEmbedder

reset_parameters()[source]

Re-initialize all internal parameters.