Custom Large Language Model (LLM) Implementation

A minimal implementation of a transformer-based Large Language Model (LLM) inspired by modern architectures like Deepseek V2, minGPT, and nanoGPT. This project includes features like low-rank attention compression, SwiGLU activation, and rotary positional embeddings.

Features

Multi-head latent attention with low-rank compression for keys, values, and queries.
SwiGLU activation for improved gating mechanisms. (SiLU currently used instead)
Rotary Positional Embeddings (RoPE) for better positional encoding.
Lightweight and modular design for easy experimentation.

Project Files

Here's an overview of the key files and directories in this project:

src/model.py: Core implementation of the model, including attention mechanisms, feed-forward layers, and the transformer architecture.
src/trainer.py: Training loop for the model.
src/main.py: Entry point for running the model (inference).
src/config.py: Configuration utilities for model hyperparameters and training config.

Installation

Clone the repository:

git clone https://github.com/clement-cvll/open-large-language-model
cd open-large-language-model

Install dependencies:
```
uv sync
```

Usage

Training

To train the model, use the provided script:

python src/trainer.py

Inference

Generate text with the trained model:

python src/main.py

Configuration

Modify src/config.py to adjust hyperparameters like:

embed_dim: Embedding dimension.
num_attention_heads: Number of attention heads.
num_layers: Number of transformer layers.
device: Torch device (mps is default).

Tokenizer and Dataset

Tokenizer

The tokenizer used in this project is the Pleias-350m-Preview tokenizer from Hugging Face (link).

Dataset

The model is designed to work with the Common Corpus dataset (link), a large, open, and permissively licensed multilingual dataset.

Inspiration

This project draws inspiration from:

minGPT: A minimal PyTorch re-implementation of GPT by Andrej Karpathy.
nanoGPT: A simplified and efficient GPT implementation, also by Andrej Karpathy.
Deepseek V2: For modern architectural choices like low-rank attention and SwiGLU.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custom Large Language Model (LLM) Implementation

Features

Project Files

Installation

Usage

Training

Inference

Configuration

Tokenizer and Dataset

Tokenizer

Dataset

Inspiration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Custom Large Language Model (LLM) Implementation

Features

Project Files

Installation

Usage

Training

Inference

Configuration

Tokenizer and Dataset

Tokenizer

Dataset

Inspiration

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages