Image Genius

AI-powered Instagram content generation that doesn't drift. Standalone PowerShell CLI · Claude or GPT for prompt writing · Free quota via ChatGPT Plus subscription

The problem

Hand-crafting on-brand Instagram images in Canva takes 1-2 hours per post. The common "ask Claude for a prompt, paste it into ChatGPT" shortcut speeds things up but has three persistent pain points:

Cross-app copy-paste fatigue — endless back-and-forth between tabs, manually saving drafts.
Prompts vary wildly across sessions — same idea, different length, different style, different result.
Regeneration drifts — ask GPT Image to tweak one thing, and it quietly changes the elements you wanted to keep.

Image Genius is a single PowerShell command that solves all three.

Highlights

Long, hyper-detailed prompts built for gpt-image-2

Image generation models trade off precision against randomness at every unspecified dimension. The shorter the prompt, the more decisions the model improvises — and the less reproducible the output.

Image Genius templates produce 600-1200 word structured prompts with 13 mandatory sections:

Section	What it locks down
Camera setup	Focal length, aperture, depth of field, sensor type
Primary subject	Exhaustive description with measurable details
Product packaging	Preserves SKU exactly via reference image
Secondary props	Position, material, color, size relative to subject
Surface & background	Material, distance, blur characteristics
Spatial layout	Rule-of-thirds grid, frame percentages, geometry
Lighting rig	Key/fill/rim/practical lights with clock positions and color temps
Color palette	Every color with hex code AND coverage %
Material & texture map	Reflectivity, finish, special properties per surface
Brand color integration	Channel-specific colors woven into the scene
Atmospheric effects	Bokeh shape, haze density, lens artifacts
Mood & style anchor	Tone + photographic reference
Negative prompt	Explicit exclusions (text, watermarks, faces, AI tells)

The result: outputs that are reproducible interpretations of explicit instructions rather than stochastic guesses.

Stability through specificity

Every aspect of the image is explicitly described, leaving nothing to chance:

Colors as (#071D49) and (#D4A84E), never just "navy" and "gold"
Lighting as "key light from 10 o'clock at 5200K", never just "from the left"
Sizes as "occupying 26% of frame height at the lower-right third intersection"
Distances as "approximately 2 meters behind the subject"
Quantities as "exactly two slices" and "three potted plants"

The modes/_shared.md ruleset enforces this discipline — the LLM cannot hand-wave abstractly.

Mandatory visual inspection of past posts

For brand-consistent logo placement, vague text instructions are not enough. Image Genius forces the prompt-generation LLM to actually open and visually inspect past Instagram posts for each SKU before writing the prompt:

List past posts in brand.yml under channels.<channel>.skus.<sku>.logo_references
The meta-prompt now contains a MANDATORY PRE-WORK block that instructs the LLM to use its Read tool to view each reference and extract:
- Exact corner (top-left/top-right/bottom-left/bottom-right)
- Logo width as % of canvas
- Padding from edges as %
- Tagline arrangement (stacked/beside/below)
The LLM then bakes these precise measurements directly into the image prompt — "width 14%, padding 2.5% top, 2% right" instead of "top-right corner"

This produces logo placement that matches the brand's established standard across every SKU automatically.

Multi-model choice — your subscription, your pick

Choose your prompt-generation engine at setup:

Claude via Claude CLI — leverages Claude's reasoning for nuanced visual descriptions
GPT via Codex CLI — uses OpenAI's flagship reasoning models

Both CLIs handle their own authentication (subscription login OR API key, your choice). Image generation always uses gpt-image-2 for the final output.

Free quota via ChatGPT Plus subscription

Two image generation modes — toggle with imagegen init:

Mode	How it works	Cost
Free quota	Delegates to Codex CLI's built-in `image_gen` tool, which uses your ChatGPT Plus/Pro subscription	$0 per image
API paid	Direct OpenAI Images API calls with your `OPENAI_API_KEY`	Pay-per-image

You can switch at any time without losing your prompts or settings.

Refine mode — surgical edits without regeneration drift

The classic frustration: you generate something great, want to tweak just one detail, but the next generation has a different lighting, different composition, different gold halo around the product.

Image Genius solves this with refine:

Reverse-engineer — feed your existing image to a vision-capable LLM, which produces a 800-1200 word reproduction prompt describing every visual detail.
Targeted edit — you describe what to change; the engine applies ONLY that change to the reverse-engineered prompt.
Regenerate — the new image preserves every detail (lighting, composition, materials, props, color halos) except your specific edit.

imagegen refine output/2026-05-27-aminovital-gold-01.png
> Change the background text from "ENERGY" to "POWER UP"

Channel and SKU aware

Built for multi-brand, multi-channel workflows. The included config models Ajinomoto Singapore's three Instagram channels:

@ajinomotosg_dryfoods — seasonings, Blendy coffee (7 SKUs)
@ajinomotosgfrozenfoods — frozen foods
@aminovital_sg — AminoVITAL sports supplements (6 SKUs)

The pipeline:

Detects the channel from keywords in your description (dryfoods/frozen/aminovital)
Identifies the SKU by matching aliases (e.g., "msg", "ajinomoto seasoning", "umami" all map to AJI-NO-MOTO)
Verifies product reference photos exist — halts if missing
Loads channel-specific brand colors, mood, and recurring elements
Picks the correct logo variant (e.g., AminoVITAL has navy + white versions for light/dark scenes)
Passes the SKU's official photo as a reference image so the generated packaging matches exactly

Generalizes to any brand with a similar multi-channel structure — just edit config/brand.yml.

Architecture

image-genius/
+-- cli.mjs                       Standalone CLI entry point (REPL + arg modes)
+-- cli.ps1                       PowerShell wrapper
+-- lib/
|   +-- cli-runner.mjs            Spawns Claude or Codex CLI via stdin pipe
|   +-- channel-detector.mjs      Deterministic keyword + alias matching
|   +-- meta-prompt-builder.mjs   Assembles instructions for the LLM
|   +-- prompt-engine.mjs         Orchestrator (detection -> meta-prompt -> CLI)
+-- templates/
|   +-- food.md                   13-section template for food photography
|   +-- lifestyle.md              13-section template for lifestyle shots
|   +-- product.md                13-section template for product hero shots
+-- modes/
|   +-- _shared.md                Cross-cutting rules and stability anchors
|   +-- generate.md               Generate-mode specifics
+-- config/
|   +-- brand.yml                 Channel + SKU catalog, brand identity, asset paths
|   +-- user-prefs.json           Your model + mode choices (gitignored content)
+-- scripts/
|   +-- init.mjs                  Interactive setup wizard
|   +-- generate-image.mjs        OpenAI API call OR codex delegation
|   +-- reverse-prompt.mjs        Vision-LLM image-to-prompt extractor
|   +-- add-logo.mjs              Fallback logo overlay via sharp
|   +-- doctor.mjs                Environment health check
+-- assets/                       Brand logos, product photos, reference posts
+-- output/                       Generated images (gitignored)
+-- drafts/                       Prompt drafts (gitignored)

Quick start

Prerequisites

Node.js 18+
PowerShell (Windows) or any shell that can run node
One of:
- Claude CLI (npm install -g @anthropic-ai/claude-code)
- Codex CLI (npm install -g @openai/codex)

Install

git clone https://github.com/EclairAikome/image-genius.git
cd image-genius
npm install

One-time setup

node cli.mjs init

The wizard will:

Ask whether you want Claude or GPT for prompt writing
Launch that CLI's own login flow (subscription or API key)
Ask whether to use free-quota or API-paid for image generation
If you picked free-quota and aren't logged into OpenAI yet, launch Codex login
Show a custom welcome page

Generate

node cli.mjs "AminoVITAL Gold post-workout scene"

Or interactive REPL:

node cli.mjs
ig> 一碗热腾腾的味之素拉面，暖色调
ig> /regenerate
ig> /refine output/2026-05-27-dryfoods-ramen-01.png
ig> /exit

Convenience: global launcher

Add this function to your PowerShell $PROFILE so you can run imagegen from any directory:

function imagegen {
    $projectPath = "D:\path\to\image-genius"
    Push-Location $projectPath
    try {
        $env:NODE_NO_WARNINGS = "1"
        node cli.mjs @args
    }
    finally {
        Pop-Location
        Remove-Item Env:NODE_NO_WARNINGS -ErrorAction SilentlyContinue
    }
}

Commands

Command	Description
`imagegen`	Interactive REPL mode
`imagegen "<description>"`	One-shot generate
`imagegen regenerate`	Fresh generation from the last description
`imagegen refine <image-path>`	Reverse-prompt + targeted edits
`imagegen prompt-only "<desc>"`	Generate the prompt without calling the image API
`imagegen init`	Re-run setup wizard
`imagegen doctor`	Environment health check
`imagegen config`	Show current configuration

REPL slash-commands (when inside ig>): /regenerate, /refine <path>, /prompt <desc>, /config, /init, /exit.

Debug / verbose mode

By default the CLI shows only clean status milestones (prompt-ready, image-ready). To see the prompt-generation LLM's full thinking and codex's internal tool calls during image generation:

$env:IMAGEGEN_VERBOSE = "1"
imagegen "your description"

Useful when debugging why a reference image wasn't picked up, or why the logo landed in the wrong spot.

Customizing for your brand

Edit config/brand.yml. The schema:

brand:
  name: "Your Brand"

defaults:
  image:
    model: "gpt-image-2"
    size: "1088x1360"      # Closest multiple-of-16 to Instagram 4:5
    quality: "high"

assets:
  ajinomoto_logo: "assets/your-main-logo.png"
  # ... other shared asset paths

channels:
  channel_name:
    instagram_handle: "@your_account"
    description: "What this channel posts"
    product_pictures_dir: "assets/Channel Name"
    keywords: [list, of, detection, signals]
    style:
      primary_colors: ["#HEX1", "#HEX2"]
      photography_style: "..."
      mood: "..."
    logo:
      file: "assets/channel-logo.png"
    skus:
      SKU-ID:
        dir: "Subfolder Name"
        aliases: [search, terms, that, map, to, this, sku]
        logo_references: ["past-post-1.png", "past-post-2.png"]

The pipeline auto-discovers everything from this file.

Why this works (the science)

Image generation models trade precision against randomness at every unspecified visual dimension. A 100-word prompt leaves hundreds of decisions to the model's prior distribution — lighting direction, exact colors, prop placement, material textures, atmospheric haze. Each unspecified dimension is a roll of the dice.

Long, hyper-specific prompts lock down nearly every visual decision, making the output a deterministic interpretation of explicit instructions rather than a stochastic guess. gpt-image-2's attention is wide enough to honor 1000+ word constraints faithfully.

Image Genius enforces this discipline automatically: you provide a one-sentence brief, the engine produces an 800+ word structured prompt with hex codes, grid coordinates, and lighting rigs.

The same principle drives refine — instead of describing changes on top of a previous generation (which compounds randomness), we first establish a 1000-word ground-truth prompt from the actual image, then make surgical edits. The new output has the same level of specificity as the original generation's full context.

License

MIT.

Built for Ajinomoto Singapore and generalized for any multi-channel brand workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Genius

The problem

Highlights

Long, hyper-detailed prompts built for gpt-image-2

Stability through specificity

Mandatory visual inspection of past posts

Multi-model choice — your subscription, your pick

Free quota via ChatGPT Plus subscription

Refine mode — surgical edits without regeneration drift

Channel and SKU aware

Architecture

Quick start

Prerequisites

Install

One-time setup

Generate

Convenience: global launcher

Commands

Debug / verbose mode

Customizing for your brand

Why this works (the science)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude/skills/instagram-ops		.claude/skills/instagram-ops
assets		assets
config		config
drafts		drafts
lib		lib
modes		modes
output		output
scripts		scripts
templates		templates
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
cli.mjs		cli.mjs
cli.ps1		cli.ps1
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

Image Genius

The problem

Highlights

Long, hyper-detailed prompts built for gpt-image-2

Stability through specificity

Mandatory visual inspection of past posts

Multi-model choice — your subscription, your pick

Free quota via ChatGPT Plus subscription

Refine mode — surgical edits without regeneration drift

Channel and SKU aware

Architecture

Quick start

Prerequisites

Install

One-time setup

Generate

Convenience: global launcher

Commands

Debug / verbose mode

Customizing for your brand

Why this works (the science)

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages