Welcome to slangmod’s documentation!

Small language model.

Ever wondered how large language models (LLMs) like ChatGPT, Claude, LLama, Deepseek, etc., actually work, like, really work? I did. And I figured there is only one way to find out: Make one yourself. From scratch.

Of course, I wasn’t expecting to beat the big players at their own game, but I wanted to know what you can do on consumer hardware (meaning a state-of-the art gaming PC with a single graphics card supported by PyTorch). So, naturally, it was going to be a small language model. These hardware limitations are reflected in software design choices. Specifically, slangmod does not employ any type of parallelization that would keep multiple GPUs busy at the same time, and all training data are loaded into CPU RAM at once, to be drip-fed to the model on the GPU from there (1 billion tokens take up about 7.5 GB worth of 64-bit integer numbers).

Having said that, slangmod provides everything you need to

preprocess and clean your text corpus;
chose and train one of the HuggingFace tokenizers;
specify a Transformer model including the type of positional encodings and the feedforward block;
train your model with a choice of optimizers and learning-rate schedulers, employing early-stopping if you like;
monitor convergence and experiment on hyperparameters;
explore text-generation algorithms like top-k, top-p or beamsearch;
and, finally, chat with your model.

To do all these things, slangmod provides a command-line interface (CLI) with fine-grained configuration options on one hand, and the raw building blocks it is made of on the other hand. Leveraging the foundational functionalities provided by the swak package, any other workflow can thus be quickly coded up.

Welcome to slangmod’s documentation!

Indices and tables