generators

Various algorithms to produce model responses from user input.

class Greedy(tokenizer, model, max_tokens=256, **_)[source]

Bases: NextToken

Simply pick the most probable token, one at a time.

Parameters:
  • tokenizer (Algo) – Fully configured Algo wrapper around a trained tokenizer.

  • model (Module) – The trained PyTorch model to use for text generation.

  • max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.

__call__(prompt)

Iteratively generate a text answer from the model responses.

Parameters:

prompt (str) – The input sequence that the model should continue.

Returns:

  • answer (str) – The text answer generated by the model.

  • terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.

property context

Maximum sequence length that the provided model can handle.

property eos_id

Integer ID of the end-of-sequence token.

logits(src, mask, more)

Call model to produce un-normalized probabilities over the vocab.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

  • logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.

  • offset (int) – Index of the token that the first entry in logits refers to.

next_token_from_logits(logits)[source]

Take the argmax over the probabilities of permissible tokens.

Parameters:

logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.

Returns:

Int64 scalar with the argmax of logits.

Return type:

Tensor

predict(src, mask, more)

Predict one permissible token at a time until end-of-sequence.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

Integer token IDs of the model response.

Return type:

list

step(token, src, mask)

Concatenate a newly predicted token to the input sequence.

Parameters:
  • token (Tensor) – The ID of the token to append to src as an int64 tensor broadcastable to shape (1, 1).

  • src (Tensor) – The input sequence of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

Returns:

  • src (Tensor) – The input sequence of maximum length context with the new token appended to it.

  • mask (Tensor) – The input mask of maximum length context with a 0 appended.

property vocab

Actual vocabulary size of the tokenizer to use.

property zero

Tensor of one 0 shaped to append to the unknown-token mask.

class TopK(tokenizer, model, max_tokens=256, k=None, temperature=1.0, **_)[source]

Bases: NextToken

Randomly draw the next token from among the k most likely ones.

Parameters:
  • tokenizer (Algo) – Fully configured Algo wrapper around a trained tokenizer.

  • model (Module) – The trained PyTorch model to use for text generation.

  • max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.

  • k (int or float, optional) – If an integer number > 0, the next token is drawn from a categorical distribution over the k most probable of all eligible tokens. A floating point number from the interval (0.0, 1.0) is interpreted as a fraction of all eligible tokens. Default to None, resulting in a random draw from a categorical distribution over all eligible tokens.

  • temperature (float, optional) – Higher temperatures concentrate more probability mass onto the most likely tokens, while lower temperatures spread the probability mass out among all eligible tokens. Defaults to 1.0, which results in unmodified logits.

__call__(prompt)

Iteratively generate a text answer from the model responses.

Parameters:

prompt (str) – The input sequence that the model should continue.

Returns:

  • answer (str) – The text answer generated by the model.

  • terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.

property context

Maximum sequence length that the provided model can handle.

property eos_id

Integer ID of the end-of-sequence token.

logits(src, mask, more)

Call model to produce un-normalized probabilities over the vocab.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

  • logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.

  • offset (int) – Index of the token that the first entry in logits refers to.

property max_k

Maximum permissible value for k.

next_token_from_logits(logits)[source]

Randomly draw the next token from the top-k most probable ones.

Parameters:

logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.

Returns:

Int64 scalar with the ID of the next token randomly chosen from top-k candidates in logits.

Return type:

Tensor

predict(src, mask, more)

Predict one permissible token at a time until end-of-sequence.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

Integer token IDs of the model response.

Return type:

list

step(token, src, mask)

Concatenate a newly predicted token to the input sequence.

Parameters:
  • token (Tensor) – The ID of the token to append to src as an int64 tensor broadcastable to shape (1, 1).

  • src (Tensor) – The input sequence of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

Returns:

  • src (Tensor) – The input sequence of maximum length context with the new token appended to it.

  • mask (Tensor) – The input mask of maximum length context with a 0 appended.

property vocab

Actual vocabulary size of the tokenizer to use.

property zero

Tensor of one 0 shaped to append to the unknown-token mask.

class TopP(tokenizer, model, max_tokens=256, p=1.0, temperature=1.0, **_)[source]

Bases: NextToken

Randomly draw the next token from the top fraction of probability.

If the model is very sure about what the next token should be, then most of the probability mass will be concentrated on that token and the random draw will be from among very few tokens. If, in contrast, the model is not so sure, and the probability mass is more widely distributed, then the random draw will be from among many more candidate tokens.

Parameters:
  • tokenizer (Algo) – Fully configured Algo wrapper around a trained tokenizer.

  • model (Module) – The trained PyTorch model to use for text generation.

  • max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.

  • p (float, optional) – Candidate tokens to draw from are chosen by ranking all in order of descending probability and taking as many as possible before the sum of their individual probabilities exceeds p. Default to 1.0, which results in a draw from a categorical distribution over all eligible tokens.

  • temperature (float, optional) – Higher temperatures concentrate more probability mass onto the most likely tokens, while lower temperatures spread the probability mass out among all eligible tokens. Defaults to 1.0, which results in unmodified logits.

__call__(prompt)

Iteratively generate a text answer from the model responses.

Parameters:

prompt (str) – The input sequence that the model should continue.

Returns:

  • answer (str) – The text answer generated by the model.

  • terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.

property context

Maximum sequence length that the provided model can handle.

property eos_id

Integer ID of the end-of-sequence token.

logits(src, mask, more)

Call model to produce un-normalized probabilities over the vocab.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

  • logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.

  • offset (int) – Index of the token that the first entry in logits refers to.

next_token_from_logits(logits)[source]

Randomly draw the next token from among the most probable ones.

Parameters:

logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.

Returns:

Int64 scalar with the ID of the next token randomly chosen from the top candidates that together have a probability of p.

Return type:

Tensor

predict(src, mask, more)

Predict one permissible token at a time until end-of-sequence.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

Integer token IDs of the model response.

Return type:

list

step(token, src, mask)

Concatenate a newly predicted token to the input sequence.

Parameters:
  • token (Tensor) – The ID of the token to append to src as an int64 tensor broadcastable to shape (1, 1).

  • src (Tensor) – The input sequence of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

Returns:

  • src (Tensor) – The input sequence of maximum length context with the new token appended to it.

  • mask (Tensor) – The input mask of maximum length context with a 0 appended.

property vocab

Actual vocabulary size of the tokenizer to use.

property zero

Tensor of one 0 shaped to append to the unknown-token mask.

class BeamSearch(tokenizer, model, max_tokens=256, width=4, boost=1.0, **_)[source]

Bases: Generator

Perform a beam search for the most likely sequence of predicted tokens.

Parameters:
  • tokenizer (Algo) – Fully configured Algo wrapper around a trained tokenizer.

  • model (Module) – The trained PyTorch model to use for text generation.

  • max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.

  • width (int, optional) – The width of the beam search. Defaults to 4.

  • boost (float, optional) – Boost the length of the generated answers. The higher this number, the longer the answers. Conversely, lower numbers promote shorter answers. Defaults to 1.0, which ranks answers purely on their (log-)probabilities.

__call__(prompt)

Iteratively generate a text answer from the model responses.

Parameters:

prompt (str) – The input sequence that the model should continue.

Returns:

  • answer (str) – The text answer generated by the model.

  • terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.

property context

Maximum sequence length that the provided model can handle.

property eos_id

Integer ID of the end-of-sequence token.

predict(src, mask, more)[source]

Beam search for the most likely sequence of predicted tokens.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

Integer token IDs of the model response.

Return type:

list

property vocab

Actual vocabulary size of the tokenizer to use.

property zero

Tensor of one 0 shaped to append to the unknown-token mask.

Base classes

class Generator(tokenizer, model, max_tokens=256, width=1, **_)[source]

Bases: ABC

Abstract base class for text generators to inherit from.

Parameters:
  • tokenizer (Algo) – Fully configured Algo wrapper around a trained tokenizer.

  • model (Module) – The trained PyTorch model to use for text generation.

  • max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.

  • width (int, optional) – Batch size to initialize buffers for the model. Defaults to 1.

Note

Child classes should also accept any number of additional, potentially unused keyword arguments so that they can be used as drop-ip replacements for each other.

__call__(prompt)[source]

Iteratively generate a text answer from the model responses.

Parameters:

prompt (str) – The input sequence that the model should continue.

Returns:

  • answer (str) – The text answer generated by the model.

  • terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.

property context

Maximum sequence length that the provided model can handle.

property eos_id

Integer ID of the end-of-sequence token.

abstract predict(src, mask, more)[source]

Subclasses implement how the model actually generates an answer.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

Integer token IDs of the model response.

Return type:

list

property vocab

Actual vocabulary size of the tokenizer to use.

property zero

Tensor of one 0 shaped to append to the unknown-token mask.

class NextToken(tokenizer, model, max_tokens=256, **_)[source]

Bases: Generator

Abstract base class for strictly next-token-prediction generators.

Parameters:
  • tokenizer (Algo) – Fully configured Algo wrapper around a trained tokenizer.

  • model (Module) – The trained PyTorch model to use for text generation.

  • max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.

Note

Child classes should also accept any number of additional, potentially unused keyword arguments so that they can be used as drop-ip replacements for each other.

logits(src, mask, more)[source]

Call model to produce un-normalized probabilities over the vocab.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

  • logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.

  • offset (int) – Index of the token that the first entry in logits refers to.

abstract next_token_from_logits(logits)[source]

Subclasses must implement how the next token is chosen from logits.

Parameters:

logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.

Returns:

Int64 scalar with the ID of the single next token chosen on the basis of the logits.

Return type:

Tensor

predict(src, mask, more)[source]

Predict one permissible token at a time until end-of-sequence.

Parameters:
  • src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

  • more (bool) – If True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. If False, the model may predict EOS first, but must follow that up with at least one non-EOS token.

Returns:

Integer token IDs of the model response.

Return type:

list

step(token, src, mask)[source]

Concatenate a newly predicted token to the input sequence.

Parameters:
  • token (Tensor) – The ID of the token to append to src as an int64 tensor broadcastable to shape (1, 1).

  • src (Tensor) – The input sequence of shape (1, S) with S being the number of tokens.

  • mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.

Returns:

  • src (Tensor) – The input sequence of maximum length context with the new token appended to it.

  • mask (Tensor) – The input mask of maximum length context with a 0 appended.