ml

Building blocks for building, training, and talking to a language model.

class Evaluator(loss, batch_size, show_progress=True)[source]

Bases: ArgRepr

Compute statistics on hold-out data to evaluate model performance.

Parameters:
  • loss (Module) – An instance of CrossEntropyLoss with the exact same parameters that were used to train your model with the possible exception of label_smoothing which may be set to 0.0

  • batch_size (int) – The batch size to request from the data producer when computing evaluation metrics.

  • show_progress (bool, optional) – Whether to show a progress bar that provides visual feedback in the console during the validation process. Defaults to True.

Raises:

ValueError – If the “reduction” of the loss is not “mean”.

__call__(model, data)[source]

Compute metrics on validation data to evaluate model performance.

Parameters:
  • model (Module) – The model to evaluate.

  • data (TestData) – The hold-out validation data to evaluate the model on

Returns:

  • loss (float) – The loss averaged over all non-padding tokens.

  • perplexity (float) – The perplexity averaged over all sequences.

  • accuracy (float) – The fraction of non-padding tokens predicted correctly.

  • top_2 (float) – The fraction of correct non-padding tokens within the top-2 most probable tokens predicted by the model.

  • top_5 (float) – The fraction of correct non-padding tokens within the top-5 most probable tokens predicted by the model.

property pad_id

Index of the padding token

perplexity(logits, targets)[source]

Compute the perplexity summed over a minibatch of sequences.

Parameters:
  • logits (Tensor) – The logits predicted by the model. Must be of sizes (batch_size, vocab, S), where S is the (padded) sequence length.

  • targets (Tensor) – The true indices of the tokens the model should predict. Must be of sizes (batch_size, S).

Returns:

The perplexity summed over all sequences in the minibatch.

Return type:

Tensor

Note

Sequences consisting of only padding tokens are not expected and will lead to a division by zero.

top(k, logits, targets)[source]

Top-k correct predictions summed over all non-padding tokens.

Parameters:
  • k (int) – The target token index has to be within the top k most probable indices predicted by the model, provided it is not padding.

  • logits (Tensor) – The logits predicted by the model. Must be of sizes (batch_size, vocab, S), where S is the (padded) sequence length.

  • targets (Tensor) – The true indices of the tokens the model should predict. Must be of sizes (batch_size, S).

Returns:

Count of correct top-k predictions over all non-padding tokens.

Return type:

Tensor