Skip to content

teatonedev/Llama3.2-From-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLAMA From Scratch

llamaFromScratchReadMePicture.png

Getting Started

1. Go to Llama website: https://www.llama.com/llama-downloads/

2. Select desired model for your choice.

3. Send a request to this model, if it is approved, you will be sent a link to download/load model locally.

4. Specify the model and tokenizer path in the code.

5. Enjoy experimenting & have fun. (Literally)

Architecture

llama321BPicture.png

Llama3.2 1B is a 16 layer decoder transformer with 2048 embedding dimension, 8192 feed forward dimension and 32 attention heads model. By combining repeated attention feed forward blocks,RMSNorm layers and a final linear projection the model learns to generate text autoregressively over 128k+ tokens of context length. Raw text is converted into integer IDs by a Byte Pair Encoding tokenizer. Each ID corresponds to a subword or a special token. Embedding dimension is 2048 which means each token ID is mapped to a learnable 2048-D vector. This layer transforms discrete token indices into continuous representations that can be processed by the transformer. Llama3.2 uses rotary positional embeddings to inject position information into the model. It is a good trick that can be useful for large context lengths to be modeled effectively. So, each embedding vector, is augmented/rotated based on its position in the sequence. Investigate closely the complex numbers part to have a better understanding for this, it would make much more sense. After all the layers a final normalization and map to the logits is used for the next token prediction.

Example Usage

Run the program, and enter a prompt as shown in the example.

prompt1.png

When you press enter, model would guess the next best possible token, as "john" shown in below image.

prompt1Output.png

For multi token generation for the entry "Meta's Llama models are"

multiTokenOutput.png

The output is "Generated Sequence: <|begin_of_text|>Meta's Llama models are a great way to get started with GPT-3. This tutorial will show you how to use the Llama models to generate text." as shown in the image.

multiTokenOutputSequence.png

Warning & Issues

The code & repository is still under development, so sometimes it could give unreasonable results like numbers. Beware of these when experimenting.

About

This repository serves as an educational resource for building an open-source LLaMA model from scratch, with step-by-step explanations throughout the process.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages