generators
Various algorithms to produce model responses from user input.
- class Greedy(tokenizer, model, max_tokens=256, **_)[source]
Bases:
NextTokenSimply pick the most probable token, one at a time.
- Parameters:
tokenizer (Algo) – Fully configured
Algowrapper around a trained tokenizer.model (Module) – The trained PyTorch model to use for text generation.
max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.
- __call__(prompt)
Iteratively generate a text answer from the model responses.
- Parameters:
prompt (str) – The input sequence that the model should continue.
- Returns:
answer (str) – The text answer generated by the model.
terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.
- property context
Maximum sequence length that the provided model can handle.
- property eos_id
Integer ID of the end-of-sequence token.
- logits(src, mask, more)
Call model to produce un-normalized probabilities over the vocab.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.
offset (int) – Index of the token that the first entry in logits refers to.
- next_token_from_logits(logits)[source]
Take the argmax over the probabilities of permissible tokens.
- Parameters:
logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.
- Returns:
Int64 scalar with the argmax of logits.
- Return type:
Tensor
- predict(src, mask, more)
Predict one permissible token at a time until end-of-sequence.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
Integer token IDs of the model response.
- Return type:
list
- step(token, src, mask)
Concatenate a newly predicted token to the input sequence.
- Parameters:
token (Tensor) – The ID of the token to append to src as an int64 tensor broadcastable to shape (1, 1).
src (Tensor) – The input sequence of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
- Returns:
src (Tensor) – The input sequence of maximum length context with the new token appended to it.
mask (Tensor) – The input mask of maximum length context with a 0 appended.
- property vocab
Actual vocabulary size of the tokenizer to use.
- property zero
Tensor of one 0 shaped to append to the unknown-token mask.
- class TopK(tokenizer, model, max_tokens=256, k=None, temperature=1.0, **_)[source]
Bases:
NextTokenRandomly draw the next token from among the k most likely ones.
- Parameters:
tokenizer (Algo) – Fully configured
Algowrapper around a trained tokenizer.model (Module) – The trained PyTorch model to use for text generation.
max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.
k (int or float, optional) – If an integer number > 0, the next token is drawn from a categorical distribution over the k most probable of all eligible tokens. A floating point number from the interval (0.0, 1.0) is interpreted as a fraction of all eligible tokens. Default to
None, resulting in a random draw from a categorical distribution over all eligible tokens.temperature (float, optional) – Higher temperatures concentrate more probability mass onto the most likely tokens, while lower temperatures spread the probability mass out among all eligible tokens. Defaults to 1.0, which results in unmodified logits.
- __call__(prompt)
Iteratively generate a text answer from the model responses.
- Parameters:
prompt (str) – The input sequence that the model should continue.
- Returns:
answer (str) – The text answer generated by the model.
terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.
- property context
Maximum sequence length that the provided model can handle.
- property eos_id
Integer ID of the end-of-sequence token.
- logits(src, mask, more)
Call model to produce un-normalized probabilities over the vocab.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.
offset (int) – Index of the token that the first entry in logits refers to.
- property max_k
Maximum permissible value for k.
- next_token_from_logits(logits)[source]
Randomly draw the next token from the top-k most probable ones.
- Parameters:
logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.
- Returns:
Int64 scalar with the ID of the next token randomly chosen from top-k candidates in logits.
- Return type:
Tensor
- predict(src, mask, more)
Predict one permissible token at a time until end-of-sequence.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
Integer token IDs of the model response.
- Return type:
list
- step(token, src, mask)
Concatenate a newly predicted token to the input sequence.
- Parameters:
token (Tensor) – The ID of the token to append to src as an int64 tensor broadcastable to shape (1, 1).
src (Tensor) – The input sequence of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
- Returns:
src (Tensor) – The input sequence of maximum length context with the new token appended to it.
mask (Tensor) – The input mask of maximum length context with a 0 appended.
- property vocab
Actual vocabulary size of the tokenizer to use.
- property zero
Tensor of one 0 shaped to append to the unknown-token mask.
- class TopP(tokenizer, model, max_tokens=256, p=1.0, temperature=1.0, **_)[source]
Bases:
NextTokenRandomly draw the next token from the top fraction of probability.
If the model is very sure about what the next token should be, then most of the probability mass will be concentrated on that token and the random draw will be from among very few tokens. If, in contrast, the model is not so sure, and the probability mass is more widely distributed, then the random draw will be from among many more candidate tokens.
- Parameters:
tokenizer (Algo) – Fully configured
Algowrapper around a trained tokenizer.model (Module) – The trained PyTorch model to use for text generation.
max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.
p (float, optional) – Candidate tokens to draw from are chosen by ranking all in order of descending probability and taking as many as possible before the sum of their individual probabilities exceeds p. Default to 1.0, which results in a draw from a categorical distribution over all eligible tokens.
temperature (float, optional) – Higher temperatures concentrate more probability mass onto the most likely tokens, while lower temperatures spread the probability mass out among all eligible tokens. Defaults to 1.0, which results in unmodified logits.
- __call__(prompt)
Iteratively generate a text answer from the model responses.
- Parameters:
prompt (str) – The input sequence that the model should continue.
- Returns:
answer (str) – The text answer generated by the model.
terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.
- property context
Maximum sequence length that the provided model can handle.
- property eos_id
Integer ID of the end-of-sequence token.
- logits(src, mask, more)
Call model to produce un-normalized probabilities over the vocab.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.
offset (int) – Index of the token that the first entry in logits refers to.
- next_token_from_logits(logits)[source]
Randomly draw the next token from among the most probable ones.
- Parameters:
logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.
- Returns:
Int64 scalar with the ID of the next token randomly chosen from the top candidates that together have a probability of p.
- Return type:
Tensor
- predict(src, mask, more)
Predict one permissible token at a time until end-of-sequence.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
Integer token IDs of the model response.
- Return type:
list
- step(token, src, mask)
Concatenate a newly predicted token to the input sequence.
- Parameters:
token (Tensor) – The ID of the token to append to src as an int64 tensor broadcastable to shape (1, 1).
src (Tensor) – The input sequence of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
- Returns:
src (Tensor) – The input sequence of maximum length context with the new token appended to it.
mask (Tensor) – The input mask of maximum length context with a 0 appended.
- property vocab
Actual vocabulary size of the tokenizer to use.
- property zero
Tensor of one 0 shaped to append to the unknown-token mask.
- class BeamSearch(tokenizer, model, max_tokens=256, width=4, boost=1.0, **_)[source]
Bases:
GeneratorPerform a beam search for the most likely sequence of predicted tokens.
- Parameters:
tokenizer (Algo) – Fully configured
Algowrapper around a trained tokenizer.model (Module) – The trained PyTorch model to use for text generation.
max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.
width (int, optional) – The width of the beam search. Defaults to 4.
boost (float, optional) – Boost the length of the generated answers. The higher this number, the longer the answers. Conversely, lower numbers promote shorter answers. Defaults to 1.0, which ranks answers purely on their (log-)probabilities.
- __call__(prompt)
Iteratively generate a text answer from the model responses.
- Parameters:
prompt (str) – The input sequence that the model should continue.
- Returns:
answer (str) – The text answer generated by the model.
terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.
- property context
Maximum sequence length that the provided model can handle.
- property eos_id
Integer ID of the end-of-sequence token.
- predict(src, mask, more)[source]
Beam search for the most likely sequence of predicted tokens.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
Integer token IDs of the model response.
- Return type:
list
- property vocab
Actual vocabulary size of the tokenizer to use.
- property zero
Tensor of one 0 shaped to append to the unknown-token mask.
Base classes
- class Generator(tokenizer, model, max_tokens=256, width=1, **_)[source]
Bases:
ABCAbstract base class for text generators to inherit from.
- Parameters:
tokenizer (Algo) – Fully configured
Algowrapper around a trained tokenizer.model (Module) – The trained PyTorch model to use for text generation.
max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.
width (int, optional) – Batch size to initialize buffers for the model. Defaults to 1.
Note
Child classes should also accept any number of additional, potentially unused keyword arguments so that they can be used as drop-ip replacements for each other.
- __call__(prompt)[source]
Iteratively generate a text answer from the model responses.
- Parameters:
prompt (str) – The input sequence that the model should continue.
- Returns:
answer (str) – The text answer generated by the model.
terminated (bool) – Whether the model answer ended with an end-of-sequence token, the alternative being that the model answer exceeds the specified max_tokens length.
- property context
Maximum sequence length that the provided model can handle.
- property eos_id
Integer ID of the end-of-sequence token.
- abstract predict(src, mask, more)[source]
Subclasses implement how the model actually generates an answer.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
Integer token IDs of the model response.
- Return type:
list
- property vocab
Actual vocabulary size of the tokenizer to use.
- property zero
Tensor of one 0 shaped to append to the unknown-token mask.
- class NextToken(tokenizer, model, max_tokens=256, **_)[source]
Bases:
GeneratorAbstract base class for strictly next-token-prediction generators.
- Parameters:
tokenizer (Algo) – Fully configured
Algowrapper around a trained tokenizer.model (Module) – The trained PyTorch model to use for text generation.
max_tokens (int, optional) – The maximum number of tokens to generate in case the end-of-sequence token is not predicted by the model first. Defaults to 256.
Note
Child classes should also accept any number of additional, potentially unused keyword arguments so that they can be used as drop-ip replacements for each other.
- logits(src, mask, more)[source]
Call model to produce un-normalized probabilities over the vocab.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.
offset (int) – Index of the token that the first entry in logits refers to.
- abstract next_token_from_logits(logits)[source]
Subclasses must implement how the next token is chosen from logits.
- Parameters:
logits (Tensor) – 1-D PyTorch tensor with un-normalized probabilities over all permissible tokens in the vocabulary.
- Returns:
Int64 scalar with the ID of the single next token chosen on the basis of the logits.
- Return type:
Tensor
- predict(src, mask, more)[source]
Predict one permissible token at a time until end-of-sequence.
- Parameters:
src (Tensor) – PyTorch tensor with the prompt converted to integer token IDs of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
more (bool) – If
True, the input to the model ends with and end-of-sequence (EOS) token and, therefore, the model must predict at least one token that is not EOS. IfFalse, the model may predict EOS first, but must follow that up with at least one non-EOS token.
- Returns:
Integer token IDs of the model response.
- Return type:
list
- step(token, src, mask)[source]
Concatenate a newly predicted token to the input sequence.
- Parameters:
token (Tensor) – The ID of the token to append to src as an int64 tensor broadcastable to shape (1, 1).
src (Tensor) – The input sequence of shape (1, S) with S being the number of tokens.
mask (Tensor) – Additive attention mask of the same shape with -inf indicating positions that should not be attended to and 0 that they should.
- Returns:
src (Tensor) – The input sequence of maximum length context with the new token appended to it.
mask (Tensor) – The input mask of maximum length context with a 0 appended.