AI-powered Instagram content generation that doesn't drift. Standalone PowerShell CLI · Claude or GPT for prompt writing · Free quota via ChatGPT Plus subscription
Hand-crafting on-brand Instagram images in Canva takes 1-2 hours per post. The common "ask Claude for a prompt, paste it into ChatGPT" shortcut speeds things up but has three persistent pain points:
- Cross-app copy-paste fatigue — endless back-and-forth between tabs, manually saving drafts.
- Prompts vary wildly across sessions — same idea, different length, different style, different result.
- Regeneration drifts — ask GPT Image to tweak one thing, and it quietly changes the elements you wanted to keep.
Image Genius is a single PowerShell command that solves all three.
Image generation models trade off precision against randomness at every unspecified dimension. The shorter the prompt, the more decisions the model improvises — and the less reproducible the output.
Image Genius templates produce 600-1200 word structured prompts with 13 mandatory sections:
| Section | What it locks down |
|---|---|
| Camera setup | Focal length, aperture, depth of field, sensor type |
| Primary subject | Exhaustive description with measurable details |
| Product packaging | Preserves SKU exactly via reference image |
| Secondary props | Position, material, color, size relative to subject |
| Surface & background | Material, distance, blur characteristics |
| Spatial layout | Rule-of-thirds grid, frame percentages, geometry |
| Lighting rig | Key/fill/rim/practical lights with clock positions and color temps |
| Color palette | Every color with hex code AND coverage % |
| Material & texture map | Reflectivity, finish, special properties per surface |
| Brand color integration | Channel-specific colors woven into the scene |
| Atmospheric effects | Bokeh shape, haze density, lens artifacts |
| Mood & style anchor | Tone + photographic reference |
| Negative prompt | Explicit exclusions (text, watermarks, faces, AI tells) |
The result: outputs that are reproducible interpretations of explicit instructions rather than stochastic guesses.
Every aspect of the image is explicitly described, leaving nothing to chance:
- Colors as
(#071D49)and(#D4A84E), never just "navy" and "gold" - Lighting as "key light from 10 o'clock at 5200K", never just "from the left"
- Sizes as "occupying 26% of frame height at the lower-right third intersection"
- Distances as "approximately 2 meters behind the subject"
- Quantities as "exactly two slices" and "three potted plants"
The modes/_shared.md ruleset enforces this discipline — the LLM cannot hand-wave abstractly.
For brand-consistent logo placement, vague text instructions are not enough. Image Genius forces the prompt-generation LLM to actually open and visually inspect past Instagram posts for each SKU before writing the prompt:
- List past posts in
brand.ymlunderchannels.<channel>.skus.<sku>.logo_references - The meta-prompt now contains a
MANDATORY PRE-WORKblock that instructs the LLM to use its Read tool to view each reference and extract:- Exact corner (top-left/top-right/bottom-left/bottom-right)
- Logo width as % of canvas
- Padding from edges as %
- Tagline arrangement (stacked/beside/below)
- The LLM then bakes these precise measurements directly into the image prompt —
"width 14%, padding 2.5% top, 2% right"instead of"top-right corner"
This produces logo placement that matches the brand's established standard across every SKU automatically.
Choose your prompt-generation engine at setup:
- Claude via Claude CLI — leverages Claude's reasoning for nuanced visual descriptions
- GPT via Codex CLI — uses OpenAI's flagship reasoning models
Both CLIs handle their own authentication (subscription login OR API key, your choice). Image generation always uses gpt-image-2 for the final output.
Two image generation modes — toggle with imagegen init:
| Mode | How it works | Cost |
|---|---|---|
| Free quota | Delegates to Codex CLI's built-in image_gen tool, which uses your ChatGPT Plus/Pro subscription |
$0 per image |
| API paid | Direct OpenAI Images API calls with your OPENAI_API_KEY |
Pay-per-image |
You can switch at any time without losing your prompts or settings.
The classic frustration: you generate something great, want to tweak just one detail, but the next generation has a different lighting, different composition, different gold halo around the product.
Image Genius solves this with refine:
- Reverse-engineer — feed your existing image to a vision-capable LLM, which produces a 800-1200 word reproduction prompt describing every visual detail.
- Targeted edit — you describe what to change; the engine applies ONLY that change to the reverse-engineered prompt.
- Regenerate — the new image preserves every detail (lighting, composition, materials, props, color halos) except your specific edit.
imagegen refine output/2026-05-27-aminovital-gold-01.png
> Change the background text from "ENERGY" to "POWER UP"Built for multi-brand, multi-channel workflows. The included config models Ajinomoto Singapore's three Instagram channels:
@ajinomotosg_dryfoods— seasonings, Blendy coffee (7 SKUs)@ajinomotosgfrozenfoods— frozen foods@aminovital_sg— AminoVITAL sports supplements (6 SKUs)
The pipeline:
- Detects the channel from keywords in your description (dryfoods/frozen/aminovital)
- Identifies the SKU by matching aliases (e.g., "msg", "ajinomoto seasoning", "umami" all map to
AJI-NO-MOTO) - Verifies product reference photos exist — halts if missing
- Loads channel-specific brand colors, mood, and recurring elements
- Picks the correct logo variant (e.g., AminoVITAL has navy + white versions for light/dark scenes)
- Passes the SKU's official photo as a reference image so the generated packaging matches exactly
Generalizes to any brand with a similar multi-channel structure — just edit config/brand.yml.
image-genius/
+-- cli.mjs Standalone CLI entry point (REPL + arg modes)
+-- cli.ps1 PowerShell wrapper
+-- lib/
| +-- cli-runner.mjs Spawns Claude or Codex CLI via stdin pipe
| +-- channel-detector.mjs Deterministic keyword + alias matching
| +-- meta-prompt-builder.mjs Assembles instructions for the LLM
| +-- prompt-engine.mjs Orchestrator (detection -> meta-prompt -> CLI)
+-- templates/
| +-- food.md 13-section template for food photography
| +-- lifestyle.md 13-section template for lifestyle shots
| +-- product.md 13-section template for product hero shots
+-- modes/
| +-- _shared.md Cross-cutting rules and stability anchors
| +-- generate.md Generate-mode specifics
+-- config/
| +-- brand.yml Channel + SKU catalog, brand identity, asset paths
| +-- user-prefs.json Your model + mode choices (gitignored content)
+-- scripts/
| +-- init.mjs Interactive setup wizard
| +-- generate-image.mjs OpenAI API call OR codex delegation
| +-- reverse-prompt.mjs Vision-LLM image-to-prompt extractor
| +-- add-logo.mjs Fallback logo overlay via sharp
| +-- doctor.mjs Environment health check
+-- assets/ Brand logos, product photos, reference posts
+-- output/ Generated images (gitignored)
+-- drafts/ Prompt drafts (gitignored)
- Node.js 18+
- PowerShell (Windows) or any shell that can run
node - One of:
- Claude CLI (
npm install -g @anthropic-ai/claude-code) - Codex CLI (
npm install -g @openai/codex)
- Claude CLI (
git clone https://github.com/EclairAikome/image-genius.git
cd image-genius
npm installnode cli.mjs initThe wizard will:
- Ask whether you want Claude or GPT for prompt writing
- Launch that CLI's own login flow (subscription or API key)
- Ask whether to use free-quota or API-paid for image generation
- If you picked free-quota and aren't logged into OpenAI yet, launch Codex login
- Show a custom welcome page
node cli.mjs "AminoVITAL Gold post-workout scene"Or interactive REPL:
node cli.mjs
ig> 一碗热腾腾的味之素拉面,暖色调
ig> /regenerate
ig> /refine output/2026-05-27-dryfoods-ramen-01.png
ig> /exitAdd this function to your PowerShell $PROFILE so you can run imagegen from any directory:
function imagegen {
$projectPath = "D:\path\to\image-genius"
Push-Location $projectPath
try {
$env:NODE_NO_WARNINGS = "1"
node cli.mjs @args
}
finally {
Pop-Location
Remove-Item Env:NODE_NO_WARNINGS -ErrorAction SilentlyContinue
}
}| Command | Description |
|---|---|
imagegen |
Interactive REPL mode |
imagegen "<description>" |
One-shot generate |
imagegen regenerate |
Fresh generation from the last description |
imagegen refine <image-path> |
Reverse-prompt + targeted edits |
imagegen prompt-only "<desc>" |
Generate the prompt without calling the image API |
imagegen init |
Re-run setup wizard |
imagegen doctor |
Environment health check |
imagegen config |
Show current configuration |
REPL slash-commands (when inside ig>): /regenerate, /refine <path>, /prompt <desc>, /config, /init, /exit.
By default the CLI shows only clean status milestones (prompt-ready, image-ready). To see the prompt-generation LLM's full thinking and codex's internal tool calls during image generation:
$env:IMAGEGEN_VERBOSE = "1"
imagegen "your description"Useful when debugging why a reference image wasn't picked up, or why the logo landed in the wrong spot.
Edit config/brand.yml. The schema:
brand:
name: "Your Brand"
defaults:
image:
model: "gpt-image-2"
size: "1088x1360" # Closest multiple-of-16 to Instagram 4:5
quality: "high"
assets:
ajinomoto_logo: "assets/your-main-logo.png"
# ... other shared asset paths
channels:
channel_name:
instagram_handle: "@your_account"
description: "What this channel posts"
product_pictures_dir: "assets/Channel Name"
keywords: [list, of, detection, signals]
style:
primary_colors: ["#HEX1", "#HEX2"]
photography_style: "..."
mood: "..."
logo:
file: "assets/channel-logo.png"
skus:
SKU-ID:
dir: "Subfolder Name"
aliases: [search, terms, that, map, to, this, sku]
logo_references: ["past-post-1.png", "past-post-2.png"]The pipeline auto-discovers everything from this file.
Image generation models trade precision against randomness at every unspecified visual dimension. A 100-word prompt leaves hundreds of decisions to the model's prior distribution — lighting direction, exact colors, prop placement, material textures, atmospheric haze. Each unspecified dimension is a roll of the dice.
Long, hyper-specific prompts lock down nearly every visual decision, making the output a deterministic interpretation of explicit instructions rather than a stochastic guess. gpt-image-2's attention is wide enough to honor 1000+ word constraints faithfully.
Image Genius enforces this discipline automatically: you provide a one-sentence brief, the engine produces an 800+ word structured prompt with hex codes, grid coordinates, and lighting rigs.
The same principle drives refine — instead of describing changes on top of a previous generation (which compounds randomness), we first establish a 1000-word ground-truth prompt from the actual image, then make surgical edits. The new output has the same level of specificity as the original generation's full context.
MIT.
Built for Ajinomoto Singapore and generalized for any multi-channel brand workflow.