Skip to content

fadyphil/git-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

git

Git-rs

Demystifying Version Control from First Principles

A from-scratch implementation of Git's core object storage engine in Rust.

Tech Stack

Status License

Overview β€’ Architecture β€’ Quick Start β€’ Roadmap


πŸ“– Overview

git-rs is not intended to replace Git. It is a surgical exploration of how version control actually works at the byte level.

By building Git's content-addressable storage, SHA-1 hashing, Zlib compression, and recursive tree serialization from first principles, this project strips away the magic and exposes the raw systems engineering underneath. It is a learning vehicle for mastering Rust's ownership model, binary serialization protocols, and Directed Acyclic Graph (DAG) traversal.

The North Star: If the official, Linus Torvalds-authored Git binary can read, parse, and verify the objects created by git-rs, the implementation is correct.


βœ… Implemented Features

Command Description Engineering Concepts Mastered
init Creates the .git/ directory skeleton and HEAD pointer. Filesystem I/O, Path resolution
hash-object -w <file> Reads a file, constructs the Git blob format, computes SHA-1, compresses with Zlib, and stores it. Byte buffers (Vec<u8>), Cryptographic hashing, Zlib streams
cat-file <-p|-t|-s> <hash> Locates, decompresses, parses, and displays stored objects. Binary parsing, Null-byte delimiters, UTF-8 coercion
write-tree Snapshots the current directory into a binary tree object. Post-order DFS recursion, Binary serialization, Raw 20-byte hashing
commit-tree <tree-hash> -m <message> Creates a commit object with author/committer signatures and parent references. Commit object format, Signature serialization, DAG parent linking
commit -m <message> Porcelain commit: resolve HEAD, snapshot working directory into a tree, create a commit with correct parent, and update the current ref. Plumbing vs porcelain, Refs (HEAD/refs/heads/*), Branch tip mutation

πŸ› οΈ Architecture & Design

The Byte-Level Contract

Git does not use JSON, XML, or high-level abstractions. It relies on a strict, continuous stream of bytes. git-rs respects this contract exactly:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  THE GIT OBJECT CONTRACT (In RAM before Zlib Compression)  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  [ HEADER ]                                                β”‚
β”‚  "tree 74\0"  ◄── ASCII Text + Null Terminator             β”‚
β”‚                                                            β”‚
β”‚  [ BINARY PAYLOAD ]                                        β”‚
β”‚  "100644 README.md\0" + [20 Raw SHA-1 Bytes]               β”‚
β”‚  "040000 src\0"       + [20 Raw SHA-1 Bytes]               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Commit Object Format

Commit objects use a human-readable ASCII format with key-value headers:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  COMMIT OBJECT (ASCII text, not binary)                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  tree <tree-hash>\n                                        β”‚
β”‚  parent <parent-hash>\n        (optional, for merge commits)β”‚
β”‚  author <name> <<email>> <timestamp> <timezone>\n           β”‚
β”‚  committer <name> <<email>> <timestamp> <timezone>\n        β”‚
β”‚  \n                                                         β”‚
β”‚  <commit message>\n                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Systems Concepts

  • Content-Addressable Storage: Every object is stored as .git/objects/XX/YYY... where XX is the first 2 hex chars of the SHA-1 hash. Deduplication is achieved by mathematical certainty, not heuristics.
  • Post-Order DAG Traversal: Because a parent directory's hash is mathematically derived from its children, write-tree utilizes recursive post-order Depth-First Search to bubble hashes up the call stack.
  • Strict Format Compliance: Objects are stored exactly as official Git expects: "<type> <size>\0<content>", Zlib-compressed, and hashed before compression.
  • Rust-Native Memory Model: Explicit ownership, &[u8] slice borrowing, Result-based error propagation, and Box<dyn Error> for unified failure handling. No garbage collection, no hidden allocations.

πŸ“¦ Dependencies

To enforce a deep understanding of the standard library, external dependencies are strictly limited to the bare minimum required for cryptography and compression:

[dependencies]
sha1 = "0.10"    # Cryptographic hashing
flate2 = "1.0"   # Zlib compression/decompression
hex = "0.4"      # Hex encoding utilities

πŸš€ Quick Start

# Clone and build the project
git clone <repo-url> && cd git-rs
cargo build --release

# Initialize a test repository
mkdir test-repo && cd test-repo
../target/release/git-rs init

# Store a file
echo "Hello Git Internals" > test.txt
../target/release/git-rs hash-object -w test.txt
# β†’ b6fc4c620b67d95f953a5c1c1230aaab5db5a1b0

# Snapshot the directory
../target/release/git-rs write-tree
# β†’ 4b825dc642cb6eb9a060e54bf8d69288fbee4904

# Create a commit from the tree
../target/release/git-rs commit-tree <tree-hash> -m "Initial commit"
# β†’ 9c5a8c9e8c9e8c9e8c9e8c9e8c9e8c9e8c9e8c9e

πŸ” Verification & Interoperability

Every phase is verified against the official git CLI. The ultimate test of interoperability:

# Read a tree object created by git-rs using the official Git binary
git cat-file -p <tree-hash>

# Expected Output:
# 100644 blob b6fc4c...    test.txt

# Verify a commit object created by git-rs
git cat-file -p <commit-hash>

# Expected Output:
# tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
# author Fady <fady@test.com> 1718000000 +0000
# committer Fady <fady@test.com> 1718000000 +0000
#
# Initial commit

If official Git can read the database, the binary format is mathematically correct.


πŸ“š Documentation

Document Phase Description
Git Internals 1–3 Content-addressable storage, blob/tree objects, binary serialization
Engineering Journal 1–3 Development log: decisions, trade-offs, and lessons learned
Lessons Learned 1–3 Key takeaways from implementing Git's object model
Build Git in Rust Guide 1–3 Step-by-step guide for building Git from scratch in Rust
Architecture Documentation 3 Visual guides to the Git object model, DAG, and write-tree algorithm
Commit Objects & the DAG 4 Commit object format, DAG structure, parent references, and serialization
DAG & Commit Serialization 4 Deep dive: DAG mathematics, commit serialization pipeline, content deduplication
Porcelain Commit & Refs 5 commit workflow, HEAD resolution, branch pointer mutation, plumbing vs porcelain

πŸ—ΊοΈ Roadmap

Phase Feature Status
1 init & .git/ structure βœ… Complete
2 hash-object, cat-file & object storage βœ… Complete
3 write-tree & binary serialization βœ… Complete
4 commit-tree & DAG parent references βœ… Complete
5 commit workflow & refs/HEAD management βœ… Complete
6 export-snapshot & LLM Wiki integration πŸ”² Planned

πŸ“š Project Structure

src/
β”œβ”€β”€ main.rs          # CLI dispatcher, argument routing, and command execution
β”œβ”€β”€ object.rs        # SHA-1 hashing, Zlib compression, read/write objects
β”œβ”€β”€ tree.rs          # Recursive directory walking, binary tree serialization
β”œβ”€β”€ commit.rs        # Commit creation, signature serialization, DAG parent linking
β”œβ”€β”€ store.rs         # Object database read/write abstraction
└── refs.rs          # HEAD pointer, branch reference read/write

### Mermaid Diagrams

Throughout the documentation, Mermaid diagrams are used to visualize Git's internal structures and data flow. These diagrams are rendered natively by GitHub when viewing the markdown files.

Example β€” DAG Structure:

graph TD
    A["Commit A<br/>(Initial)"] --> B["Commit B"]
    B --> C["Commit C"]
    B --> D["Commit D"]
    C --> E["Commit E<br/>(Merge)"]
    D --> E
Loading

Example β€” Commit Object Format:

flowchart LR
    Tree[Tree Hash] --> Headers[Key-Value Headers]
    Parent[Parent Hash] --> Headers
    Author[Author Info] --> Headers
    Committer[Committer Info] --> Headers
    Headers --> Blank[Blank Line]
    Blank --> Message[Commit Message]
    Message --> SHA[SHA-1 Hash + Zlib Compress]
    SHA --> Store[.git/objects/]
Loading


🧠 Learning Objectives

This project is a deliberate exercise in systems programming:

  1. Memory & Ownership: Master Rust's borrow checker, &[u8] slices, and zero-copy parsing.
  2. Binary Protocols: Implement strict serialization (null-byte separators, raw 20-byte hashes vs 40-char hex strings).
  3. Graph Theory: Understand how Directed Acyclic Graphs (DAGs) enforce history integrity and enable deduplication. Commit objects form the vertices of the DAG, with parent references creating the edges.
  4. CLI Architecture: Build a production-grade dispatcher with strict argument validation and clean error propagation.

Built following the Build Git From Scratch in Rust blueprint.

Documentation: Phase 4 β€” Commit Objects & DAG Β· Phase 4 β€” DAG Deep Dive Β· Phase 5 β€” Porcelain Commit & Refs

This project is a learning vehicle for systems programming. Not intended for production use.

About

Git core functionality using rust , Blob (binary large objects) , Tree , Commit

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages