losses

Custom loss functions not (yet) shipped with PyTorch out of the box.

Especially when networks should make probabilistic predictions, that is, when they are set up to predict the parameters of an analytical probability mass or density function instead of just an expectation value, special loss functions are required that implement the negative log-likelihood of the matching probability distribution.

class Reduction(*values)[source]

Bases: StrEnum

Specify the aggregation level when evaluating loss functions.

MEAN = mean

SUM = sum

NONE = none

class RMSELoss(reduction='mean', eps=1e-08)[source]

Bases: Module

Root mean squared error loss.

PyTorch only comes with a mean squared error loss. Since it is often more intuitive to compare error and target value when they are on the same scale, the square root of the MSE is naively implemented here.

Parameters:

reduction (string, optional) – One of “mean”, “sum” or “none”. Defaults to “mean”. Whether and, if so, how to aggregate the tensor resulting from evaluating the point-wise loss function on the input. Use the Reduction enum to avoid typos.
eps (float, optional) – Many loss functions require input and/or target tensors to be bound by some lower and/or upper value. It is the user’s responsibility to ensure that they are. However, evaluating a loss function just at the interval boundary of its support might lead to numerical inaccuracies. To avoid these, it is often advisable to shift such values away from boundaries by a small value eps. Defaults to 1e-8.

See also

Reduction

forward(y_hat, y_true)[source]

Compute RMSE loss between predicted and observed values.

For numerical stability, MSE values smaller than eps will be clamped to eps before taking the square root.

Parameters:

y_hat (Tensor) – Predicted expectation values.
y_true (Tensor) – Actually observed values. Should have the same dimensionality as y_hat, but this will not be checked for, let alone enforced.

Returns:

The root mean square error.

Return type:

Tensor

class TweedieLoss(reduction='mean', eps=1e-06)[source]

Bases: _BaseLoss

Tweedie loss for zero-inflated, right-skewed, long-tailed target.

Implements the deviance of the Tweedie distribution for power parameters between 1 and 2, where it can be seen as a compound Poisson-Gamma distribution.

Parameters:

reduction (string, optional) – One of “mean”, “sum” or “none”. Defaults to “mean”. Whether and, if so, how to aggregate the tensor resulting from evaluating the point-wise loss function on the input. Use the Reduction enum to avoid typos.
eps (float, optional) – Many loss functions require input and/or target tensors to be bound by some lower and/or upper value. It is the user’s responsibility to ensure that they are. However, evaluating a loss function just at the interval boundary of its support might lead to numerical inaccuracies. To avoid these, it is often advisable to shift such values away from boundaries by a small value eps. Defaults to 1e-6.

See also

Reduction

forward(mu, p, y_true)[source]

Forward pass for the Tweedie loss function.

Parameters:

mu (Tensor) – Predicted expectation values. Should be greater than or equal to 0, but this will not be checked for, let alone enforced. However, eps will be added to ensure numerical stability.
p (Tensor) – Predicted values for the power parameter. Should have values strictly between 1 and 2. All values beyond that range will be clamped to lie within this range (by at least eps).
y_true (Tensor) – Actually observed values. Should be greater than or equal to 0, but this will not be checked for, let alone enforced.

Returns:

Proportional to the negative log-likelihood of a Tweedie distribution for 1 < p < 2.

Return type:

Tensor

class BetaBernoulliLoss(reduction='mean', eps=1e-06)[source]

Bases: _BaseLoss

Special case of the Beta-Binomial negative log-likelihood for 1 trial.

Use to make probabilistic predictions for binary classification, where the model outputs the alpha and beta coefficients of the Beta-distributed success probability instead of a point estimate of the success probability.

Parameters:

reduction (string, optional) – One of “mean”, “sum” or “none”. Defaults to “mean”. Whether and, if so, how to aggregate the tensor resulting from evaluating the point-wise loss function on the input. Use the Reduction enum to avoid typos.
eps (float, optional) – Many loss functions require input and/or target tensors to be bound by some lower and/or upper value. It is the user’s responsibility to ensure that they are. However, evaluating a loss function just at the interval boundary of its support might lead to numerical inaccuracies. To avoid these, it is often advisable to shift such values away from boundaries by a small value eps. Defaults to 1e-6.

See also

Reduction

forward(alpha, beta, y_true)[source]

Forward pass for the Beta-Binomial loss function.

Parameters:

alpha (Tensor) – Predicted values for the alpha parameter of the Beta-distributed success probability in binary classification. Should be greater than or equal to 0, but this will not be checked for, let alone enforced. However, eps is added to ensure numerical stability.
beta (Tensor) – Predicted values for the beta parameter of the Beta-distributed success probability in binary classification. Should be greater than or equal to 0, but this will not be checked for, let alone enforced. However, eps is added to ensure numerical stability.
y_true (Tensor) – Actually observed binary outcomes, encoded as 1.0 for success and 0.0 for failure.

Returns:

Negative log-likelihood of the Beta-Binomial distribution for the special case of one trial.

Return type:

Tensor

class GammaLoss(reduction='mean', eps=1e-06)[source]

Bases: _BaseLoss

Negative log-likelihood of a scaled Gamma distribution.

Potentially useful for strictly positive targets and skewed residuals. For convenience, the Gamma distribution is parameterized in terms of mean and standard deviation instead of its standard form in terms of shape and scale (or rate) parameters.

Parameters:

reduction (string, optional) – One of “mean”, “sum” or “none”. Defaults to “mean”. Whether and, if so, how to aggregate the tensor resulting from evaluating the point-wise loss function on the input. Use the Reduction enum to avoid typos.
eps (float, optional) – Many loss functions require input and/or target tensors to be bound by some lower and/or upper value. It is the user’s responsibility to ensure that they are. However, evaluating a loss function just at the interval boundary of its support might lead to numerical inaccuracies. To avoid these, it is often advisable to shift such values away from boundaries by a small value eps. Defaults to 1e-6.

See also

Reduction

forward(mu, sigma, y_true)[source]

Forward pass for the scaled Gamma loss function.

Parameters:

mu (Tensor) – Predicted mean values. Should be greater than or equal to 0, but this will not be checked for, let alone enforced. However, eps is added to ensure numerical stability.
sigma (Tensor) – Predicted standard deviations. Should be greater than or equal to 0, but this will not be checked for, let alone enforced. However, eps is added to ensure numerical stability.
y_true (Tensor) – Actually observed values. Should be greater than or equal to 0, but this will not be checked for, let alone enforced. However, eps is added to ensure numerical stability.

Returns:

The negative log-likelihood of a scaled Gamma distribution.

Return type:

Tensor

class StudentLoss(reduction='mean', eps=1e-06)[source]

Bases: _BaseLoss

Negative log-likelihood of a non-standardized Student’s t distribution.

Parameters:

reduction (string, optional) – One of “mean”, “sum” or “none”. Defaults to “mean”. Whether and, if so, how to aggregate the tensor resulting from evaluating the point-wise loss function on the input. Use the Reduction enum to avoid typos.
eps (float, optional) – Many loss functions require input and/or target tensors to be bound by some lower and/or upper value. It is the user’s responsibility to ensure that they are. However, evaluating a loss function just at the interval boundary of its support might lead to numerical inaccuracies. To avoid these, it is often advisable to shift such values away from boundaries by a small value eps. Defaults to 1e-6.

See also

Reduction

forward(df, loc, scale, y_true)[source]

Forward pass for the Student’s t loss function.

Parameters:

df (tensor) – Predicted degrees of freedom. Should be greater than zero, but this will not be checked for, let alone enforced. However, eps is added for numerical stability.
loc (Tensor) – Predicted mean values.
scale (Tensor) – Predicted scales. Should be greater than or equal to 0, but this will not be checked for, let alone enforced. However, eps is added to ensure numerical stability.
y_true – Actually observed values.

Returns:

Negative log-likelihood of a Student’s t distribution.

Return type:

Tensor

class NegativeBinomialLoss(reduction='mean', eps=1e-06)[source]

Bases: _BaseLoss

Negative log-likelihood of a Negative-Binomial distribution.

Potentially useful for counts data where more flexibility than provided by a Poisson loss is desired because the data might be over-dispersed. For convenience, the Negative-Binomial distribution is parameterized in terms of mean and standard deviation instead of its standard form.

Parameters:

reduction (string, optional) – One of “mean”, “sum” or “none”. Defaults to “mean”. Whether and, if so, how to aggregate the tensor resulting from evaluating the point-wise loss function on the input. Use the Reduction enum to avoid typos.
eps (float, optional) – Many loss functions require input and/or target tensors to be bound by some lower and/or upper value. It is the user’s responsibility to ensure that they are. However, evaluating a loss function just at the interval boundary of its support might lead to numerical inaccuracies. To avoid these, it is often advisable to shift such values away from boundaries by a small value eps. Defaults to 1e-6.

Note

The present parameterization only makes sense if the variance is strictly greater than the mean. This is best taken into account already on the model side, e.g., by forcing the output for sigma to be a that of mu multiplied by one plus some (learnable) fraction of mu, as described by D. Salinas et al. in their DeepAR paper. [1]

References

forward(mu, sigma, y_true)[source]

Forward pass for the Negative-Binomial loss function.

Parameters:

mu (Tensor) – Predicted mean values. Should be greater than or equal to 0, but this will not be checked for, let alone enforced. However, eps is added to ensure numerical stability.
sigma (Tensor) – Predicted standard deviations. Should be greater than or equal to the square root of mu, but this will not be checked for, let alone enforced. However, eps is added to ensure numerical stability.
y_true (Tensor) – Actually observed values. Should be greater than or equal to 0, but this will not be checked for, let alone enforced.

Returns:

The negative log-likelihood of a Negative-Binomial distribution.

Return type:

Tensor

class XEntropyLoss(weight=None, ignore_index=-100, reduction='mean', label_smoothing=0.0)[source]

Bases: CrossEntropyLoss

Subclass of PyTorch’s CrossEntropyLoss with added functionality.

When in training mode (toggled by calling the module train method without argument or with True), the label_smoothing is applied according to value provided at instantiation. When in evaluation mode, however, the label-smoothing is set to 0.0 to report reproducible and unbiased values for test- or validation loss and (log-)perplexity.

Parameters:

weight (Tensor, optional) – A manual rescaling weight given to each class. If given, has to be a 1D-tensor of a size equal to the number of classes and a floating point dtype. Defaults to None.
ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When reduction is “mean””, the loss is averaged only over non-ignored targets. Only applicable when the target contains class indices. Defaults to -100.
reduction (str, optional) – Specifies the reduction to apply to the output: “none”, “mean”, or “sum”. Defaults to “mean”, which means the weighted mean in case a valid weight was provided. Use the Reduction enum to avoid typos.
label_smoothing (float, optional) – Specifies the amount of smoothing when computing the loss. Must lie in the interval [0.0, 1.0] where 0.0 means no smoothing. The targets become a mixture of the original ground truth and a uniform distribution as described in Rethinking the Inception Architecture for Computer Vision. Default to 0.0

Note

For more information on this loss, please refer to the full PyTorch documentation.

See also

Reduction

eval()[source]

Toggle evaluation mode by setting label-smoothing to 0.0

Calling eval is Equivalent to calling the method train(False).

Returns:: Itself in evaluation mode.
Return type:: XEntropyLoss

train(mode=True)[source]

Toggle training mode: label-smoothing as given at instantiation.

Parameters:: mode (bool, optional) – Whether to switch training mode on (True) or off (False). Defaults to True.
Returns:: Itself in evaluation mode.
Return type:: XEntropyLoss