🚀 AI-Ready Text Extractor for Git Repos | CLI tool for dataset prep, summaries, reverse engineering & bundling
Gittxt is an open-source tool that transforms GitHub repositories into LLM-compatible datasets.
Perfect for developers, data scientists, and AI engineers, Gittxt helps you extract and structure .txt, .json, .md content into clean, analyzable formats for use in:
- Prompt engineering
- Fine-tuning & retrieval
- Codebase summarization
- Open-source LLM workflows
Large Language Models often expect input in very specific formats. Many tools (e.g., ChatGPT, Gemini, Ollama) struggle with arbitrary GitHub URLs, complex folders, or non-text assets.
Gittxt bridges this gap by:
- Extracting all usable text from a repo
- Organizing it for easy ingestion by LLMs
- Offering structured
.txt,.json,.md,.zipoutputs - Giving you full control with filtering, formatting, and plugin support
- ✅ Text extractor for code, docs, config files
- ✅ Output:
.txt,.json,.md,.zip - ✅ CLI and plugin system (FastAPI, Streamlit)
- ✅ AI-ready summaries (OpenAI / Ollama)
- ✅ Reverse engineer
.txt/.jsonreports back into repo structure - ✅
.gittxtignoresupport - ✅ Async scanning for large projects
- ✅ Works offline and in constrained compute environments
outputs/
├── txt/ # Plain text report
├── json/ # Structured metadata
├── md/ # Markdown-formatted summary
└── zip/ # Bundled results + manifest
pip install gittxtgittxt scan https://github.com/sandy-sp/gittxt --output-format txt,json --lite --zipgittxt re outputs/project.md -o ./restoredTry the hosted version (no install required!)
- Use it to build structured input for LLMs
- Ideal for prompt chaining, document agents, code summarization
- Helps transform messy repos into single-file, AI-consumable reports
All CLI flags, plugins, formats, and filters are documented here:
Gittxt supports modular plugins:
gittxt-api: Run via FastAPI backendgittxt-streamlit: Interactive dashboard
Install & run with:
gittxt plugin install gittxt-streamlit
gittxt plugin run gittxt-streamlitCreated by Sandeep Paidipati, Gittxt was born out of a need to:
- Quickly preview and summarize GitHub repos with LLMs
- Avoid manual copying, filtering, and converting files
- Create AI-ready datasets for learning and experimentation
- ⭐️ Star this repo if it helped you
- 🧵 Share it with your dev/AI community
- 🤝 Contact me for collaboration or sponsorship
MIT License © Sandeep Paidipati
Gittxt — Get Text from Git — Optimized for AI