Turns your daily engineering notes and git commits into platform-ready LinkedIn posts, Twitter threads, and technical blog posts — using a 7-node LangGraph pipeline backed by Qwen2.5-72B on HuggingFace Inference.
You write raw notes at the end of a dev session:
Fixed websocket buffer flush threshold today.
Latency dropped from 820ms to 580ms.
VAD still triggering on background noise — not fixed yet.
The engine extracts the engineering story, picks a narrative angle, applies your writing style, and generates:
- A LinkedIn post with a scroll-stopping hook and your actual numbers
- A Twitter/X thread structured for developer engagement
- A full 1200–1500 word technical blog post (two-stage: blueprint → draft)
Built in public as a personal tool, designed to scale into a SaaS product.
Developer Notes + Git Log
│
▼
run_pipeline_service()
│
├── Memory Search (ChromaDB — finds similar past content)
│
▼
LangGraph Pipeline (7 nodes)
│
├── parse_notes ──→ Cache Layer (SHA-256 hash lookup)
├── parse_git ──→ Cache Layer
├── context_builder (pure transform, no LLM)
├── angle_generator ──→ Cache Layer
├── style_selector (loads creator style profile, ~1ms)
├── blog_blueprint (stage 1 of 2-stage blog, LLM)
└── post_generator (LinkedIn + Twitter + Blog, separate LLM call per platform)
│
▼
Memory Store (saves run for future similarity search)
│
▼
generated_posts → FastAPI → Streamlit UI
Stack: Python 3.10+ · FastAPI · LangGraph · LangChain · Qwen2.5-72B · HuggingFace Inference Router · ChromaDB · Streamlit · Pydantic-Settings · structlog · uv
A deterministic directed graph where each node has a single responsibility. Nodes communicate through a typed PipelineState TypedDict — no globals, no side effects. Errors in one node don't kill the pipeline.
The LLM backend uses HuggingFace's OpenAI-compatible router (router.huggingface.co/v1). Swap inference providers without touching code — append :sambanova, :together, or :nebius to the model name in .env.
LinkedIn and Twitter have different audiences making different content decisions. The prompt system encodes this explicitly: LinkedIn gets a storytelling-first prompt with audience psychology context; Twitter gets a punchy, hook-first thread structure. Same engineering input produces platform-native output for each.
Four profiles ship out of the box (dhruv_default, swyx, primagen, levelsio). Each .md file describes hook patterns, sentence rhythm, tone rules, and structural patterns. The style loader injects these into every generation prompt. Add your own by creating creator_styles/yourname.md — no code changes.
Single-call blog generation produces flat summaries. This pipeline splits it:
- Stage 1 (
blog_blueprint_node): LLM plans title options, section structure, hook paragraph, key technical points, what to avoid. - Stage 2 (
post_generator): LLM writes 1200–1500 words against the blueprint.
Users can paste extra material (YouTube transcripts, reference blog links, docs) to enrich the blog source material.
SHA-256 content-addressable cache for parse_notes, parse_git, and angle_generator nodes. Same input = same hash = instant return from JSON file. Configurable TTL (default 24h). Every API response includes cache_hits in metadata showing exactly which nodes were served from cache vs live LLM calls. Expected reduction: 40–70% in cost and latency for repeated or similar inputs.
Stores generated posts as embeddings in a local ChromaDB vector database. Before each pipeline run, searches for semantically similar past content. If a match exceeds the configurable similarity threshold (default: 0.82), the past posts are injected into the generation prompt as a style reference — not to copy, but to calibrate tone and structure. Memory search happens outside the pipeline so failures are invisible to the user.
Every configurable value — API keys, model ID, inference provider, temperature, max tokens, cache TTL, memory threshold, ports, CORS — lives in settings.py as a typed Pydantic field. No hardcoded values anywhere in the codebase. Environment: .env file in development, system environment variables in production via systemd.
structlog writes JSON-formatted logs to both terminal and file. Every node logs node_started and node_completed events with timing in milliseconds. Production logs are queryable and machine-parseable.
content_engine/
├── backend/
│ ├── api/
│ │ └── main.py # FastAPI app — 7 endpoints
│ ├── cache/
│ │ └── cache_manager.py # SHA-256 hash cache, TTL, stats
│ ├── config/
│ │ └── settings.py # All config — Pydantic-Settings
│ ├── ingestion/
│ │ ├── dump_parser.py # Notes file reader + cleaner
│ │ └── git_parser.py # GitPython commit history reader
│ ├── llm/
│ │ ├── prompts.py # All prompt templates
│ │ ├── providers.py # LLM abstraction — HuggingFaceProvider
│ │ └── style_loader.py # Creator style profile loader
│ ├── memory/
│ │ └── content_memory.py # ChromaDB semantic memory layer
│ ├── services/
│ │ └── run_pipeline.py # Orchestration — memory search + graph invoke
│ └── utils/
│ └── logger.py # structlog JSON logger
├── creator_styles/
│ ├── dhruv_default.md # Personal style profile
│ ├── swyx.md
│ ├── primagen.md
│ └── levelsio.md
├── frontend/
│ └── streamlit_app.py # 3-page UI: Generate, Blog Studio, Settings
├── inputs/
│ └── today_dump.txt # Drop your daily notes here
├── pipeline/
│ ├── graph.py # LangGraph StateGraph assembly
│ ├── state.py # PipelineState TypedDict
│ └── nodes/
│ ├── parse_notes.py # Stage 1: LLM extracts from notes
│ ├── parse_git.py # Stage 2: LLM extracts from commits
│ ├── context_builder.py # Stage 3: Pure merge, no LLM
│ ├── angle.py # Stage 4: Narrative angle selection
│ ├── style_selector.py # Stage 5: Load creator style
│ ├── blog_blueprint.py # Stage 6: Blog structure plan
│ └── post_generator.py # Stage 7: Platform content generation
├── pyproject.toml
├── setup.sh
├── start.sh
└── .env.example
- Python 3.10+
- Git
- A HuggingFace account (free)
git clone https://github.com/YOUR_USERNAME/content-engine.git
cd content-engine
chmod +x setup.sh start.sh
./setup.sh- Go to huggingface.co/settings/tokens
- Create a Fine-grained token
- Enable permission: Make calls to Inference Providers
- Copy the token (starts with
hf_...)
cp .env.example .envEdit .env and set at minimum:
HF_TOKEN=hf_your_token_here
Optional — pin to a specific inference provider for consistent latency:
LLM_MODEL=Qwen/Qwen2.5-72B-Instruct:sambanova
./start.sh- Streamlit UI: http://localhost:8501
- API docs (Swagger): http://localhost:8000/docs
source .venv/bin/activate
uv pip install chromadbThen in .env:
MEMORY_ENABLED=true
| Method | Endpoint | Description |
|---|---|---|
POST |
/generate |
Generate content from raw notes + git |
POST |
/generate/file |
Generate from server-side notes file |
GET |
/health |
Health check including cache + memory stats |
GET |
/styles |
List available creator style profiles |
GET |
/models |
List available LLM models |
GET |
/cache/stats |
Cache file count, size, TTL info |
POST |
/cache/clear |
Delete cache files (all or older than N hours) |
GET |
/memory/stats |
ChromaDB entry count and similarity threshold |
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"raw_notes": "Fixed websocket buffer today. Latency 820ms → 580ms. Buffer flush threshold was 4KB, changed to 1KB.",
"platforms": ["linkedin", "twitter"],
"style": "dhruv_default"
}'{
"success": true,
"generated_posts": {
"linkedin": "820ms → 580ms.\n\nThat was the latency drop after one config change...",
"twitter": "1/ My voice pipeline had 820ms latency.\n\nThe model wasn't slow..."
},
"metadata": {
"model": "Qwen/Qwen2.5-72B-Instruct",
"cache_hits": ["parse_notes", "parse_git"],
"cached_node_count": 2,
"memory_hit": false,
"narrative_angle": "PERFORMANCE_BREAKTHROUGH",
"style_used": "dhruv_default",
"total_service_duration_ms": 8240
}
}All settings are environment variables, documented in .env.example.
| Variable | Default | Description |
|---|---|---|
HF_TOKEN |
required | HuggingFace access token |
LLM_MODEL |
Qwen/Qwen2.5-72B-Instruct |
Model ID, optionally with :provider |
HF_BASE_URL |
https://router.huggingface.co/v1 |
Inference router endpoint |
LLM_TEMPERATURE |
0.7 |
Generation randomness (0.0–2.0) |
LLM_MAX_TOKENS |
2500 |
Max tokens per response |
LLM_REQUEST_TIMEOUT |
120 |
API timeout in seconds |
CACHE_ENABLED |
true |
Enable deterministic cache |
CACHE_DIR |
cache |
Cache file directory |
CACHE_TTL_HOURS |
24 |
Cache entry lifetime in hours |
MEMORY_ENABLED |
false |
Enable ChromaDB semantic memory |
MEMORY_DIR |
memory |
ChromaDB storage directory |
MEMORY_SIMILARITY_THRESHOLD |
0.82 |
Minimum similarity for memory match |
APP_ENV |
development |
development or production |
API_PORT |
8000 |
FastAPI port |
STREAMLIT_PORT |
8501 |
Streamlit port |
LOG_LEVEL |
INFO |
DEBUG, INFO, WARNING, ERROR |
Create creator_styles/yourname.md following this structure:
# Your Name — Creator Style Profile
## Voice
[Describe overall tone and approach]
## Hook Patterns
- "Pattern 1"
- "Pattern 2"
## Sentence Rhythm
[Short? Long? Mixed? Describe the cadence]
## Structure Pattern
[Hook → Context → Insight → Detail → Lesson]
## Tone Rules
- What to always do
- What to never say
## Technical Density
[Low / Medium / High — and why]Pass style=yourname in any API request. No code changes needed.
Why LangGraph instead of a simple function chain?
Each node is independently testable, replaceable, and observable. The PipelineState TypedDict makes data flow explicit — you can inspect what any node received and what it returned. Adding a new node means adding one function and two lines in graph.py.
Why separate LLM calls per platform instead of one combined call? A single call producing LinkedIn + Twitter + Blog in one response produces noticeably worse quality. Each platform has different length constraints, different audience psychology, and different structural requirements. Separate calls let each prompt be fully optimized for its target.
Why SHA-256 for the cache key? Content-addressable caching: same input always maps to the same key, different input always maps to a different key. No collision risk in practice. The hash is computed in microseconds — negligible overhead before a 10-second LLM call.
Why is memory search outside the LangGraph pipeline?
Memory is advisory. If the search fails (ChromaDB unavailable, import error), the pipeline should continue normally with zero user-visible impact. Wrapping it in the service layer with a broad try/except achieves this. Inside a pipeline node, an exception would stall the graph.
Why total=False in PipelineState?
With total=False, all TypedDict fields are optional. This lets nodes return only the fields they're responsible for — LangGraph merges partial dicts into the shared state. Without it, every node would need to return all fields.
- Trend analyzer node — scrapes top-performing dev posts, injects current topic patterns
- Style learning from examples — paste 5 posts, generate a custom style profile
- Async pipeline execution — parallel node execution for platforms that don't depend on each other
- Web UI cache management panel in Streamlit settings page
- Post scheduling integration (LinkedIn API, Buffer)
- Multi-day digest mode — generate weekly retrospective from 5 days of notes
MIT