ml

Building blocks for building, training, and talking to a language model.

subpackages

class Evaluator(loss, batch_size, show_progress=True)[source]

Bases: ArgRepr

Compute statistics on hold-out data to evaluate model performance.

Parameters:

loss (Module) – An instance of CrossEntropyLoss with the exact same parameters that were used to train your model with the possible exception of label_smoothing which may be set to 0.0
batch_size (int) – The batch size to request from the data producer when computing evaluation metrics.
show_progress (bool, optional) – Whether to show a progress bar that provides visual feedback in the console during the validation process. Defaults to True.

Raises:

ValueError – If the “reduction” of the loss is not “mean”.

__call__(model, data)[source]

Compute metrics on validation data to evaluate model performance.

Parameters:

Returns:

loss (float) – The loss averaged over all non-padding tokens.
perplexity (float) – The perplexity averaged over all sequences.
accuracy (float) – The fraction of non-padding tokens predicted correctly.
top_2 (float) – The fraction of correct non-padding tokens within the top-2 most probable tokens predicted by the model.
top_5 (float) – The fraction of correct non-padding tokens within the top-5 most probable tokens predicted by the model.

perplexity(logits, targets)[source]

Compute the perplexity summed over a minibatch of sequences.

Parameters:

logits (Tensor) – The logits predicted by the model. Must be of sizes (batch_size, vocab, S), where S is the (padded) sequence length.
targets (Tensor) – The true indices of the tokens the model should predict. Must be of sizes (batch_size, S).

Returns:

The perplexity summed over all sequences in the minibatch.

Return type:

Tensor

Note

Sequences consisting of only padding tokens are not expected and will lead to a division by zero.

top(k, logits, targets)[source]

Top-k correct predictions summed over all non-padding tokens.

Parameters:

k (int) – The target token index has to be within the top k most probable indices predicted by the model, provided it is not padding.
logits (Tensor) – The logits predicted by the model. Must be of sizes (batch_size, vocab, S), where S is the (padded) sequence length.
targets (Tensor) – The true indices of the tokens the model should predict. Must be of sizes (batch_size, S).

Returns:

Count of correct top-k predictions over all non-padding tokens.

Return type:

Tensor