Anita — AI-Powered Anki Deck Generator

Turn a plain CSV of word pairs into a rich, multimedia Anki deck with AI-generated native-like pronunciations and optional illustrations — in minutes, for any language pair.

Why Anita?

Language learners lose hours building decent flashcards by hand. Anita automates the tedious part — generating TTS audio (OpenAI or ElevenLabs) and optional DALL·E images — so you can focus on reviewing, not curating. Feed it a CSV, get back a .apkg you can import straight into Anki on desktop or mobile.

Features

CSV in, .apkg out — point it at a two-column CSV and get a ready-to-import Anki deck.
Pluggable TTS — OpenAI tts-1 by default, ElevenLabs multilingual v2 optional.
Optional illustrations — DALL·E 2 images auto-resized to 128×128 px for clean cards.
Local media cache — every generated asset is cached in a SQLite DB so repeat runs are free and fast. See Cache for location and lifecycle.
Language-agnostic — works for any source → target language pair.
Clean card template — distraction-free front/back with audio playback and image.

Quickstart

# Install
uv tool install anita-anki  # or: pipx install anita-anki

# Set credentials (or copy `.env.example` → `.env` and edit)
export OPENAI_API_KEY=sk-...
# Optional:
export ELEVENLABS_API_KEY=...

# Generate
anita generate examples/basics.csv my_deck.apkg --deck-name "My Vocabulary"

Import my_deck.apkg into Anki and start reviewing.

Installation

From PyPI (recommended)

uv tool install anita-anki
# or
pipx install anita-anki
# or
pip install anita-anki

The distribution is published as anita-anki on PyPI (the name anita was taken), but the import name and CLI are both anita.

From source (development)

git clone https://github.com/timpara/anita.git
cd anita
uv sync --all-extras
uv run anita --help

Usage

CLI

anita generate INPUT.csv OUTPUT.apkg [OPTIONS]

Common options:

Flag	Default	Description
`--deck-name`	`Anita Vocabulary`	Deck name shown inside Anki.
`--tts`	`openai`	TTS provider: `openai` or `elevenlabs`.
`--images / --no-images`	`--no-images`	Generate DALL·E illustrations per card.
`--voice-id`	(elevenlabs preset)	ElevenLabs voice ID.
`--verbose`	`false`	Enable debug logging.

Run anita generate --help for the full list.

Python API

from anita import AnkiDeckGenerator

generator = AnkiDeckGenerator(
    deck_name="Italian Restaurant",
    tts_provider="elevenlabs",
    generate_images=True,
)
generator.generate_deck("examples/restaurant.csv", "restaurant.apkg")

CSV format

Two columns: source word (prompt side) and target word (answer side). Header row is optional and auto-detected.

apple,mela
house,casa
book,libro
water,acqua

Working examples live in examples/.

Configuration

API keys are read from environment variables. A .env file in the working directory is auto-loaded if present — the fastest way to get started is:

cp .env.example .env
# then edit .env with your real keys

.env is git-ignored; never commit it.

Variable	Required for
`OPENAI_API_KEY`	OpenAI TTS, DALL·E
`ELEVENLABS_API_KEY`	ElevenLabs TTS (optional)

Generated media is cached under your OS user-cache directory (via platformdirs) so re-running on the same words incurs zero API cost.

Cost estimate

Service	Use case	Model	Approximate cost
OpenAI	TTS	`tts-1`	$0.015 / 1k characters
OpenAI	Image generation	DALL·E 2	$0.020 / image (256×256)
ElevenLabs	Premium TTS	v2	Per your subscription tier

A 500-word deck with audio-only (OpenAI) typically costs well under $0.50.

Runtime estimates

These are rough wall-clock figures for a fresh run (no cache hits) on a 100 Mbps connection. The dominant factor is the per-item round-trip latency to the provider API; CPU and disk are negligible. Cached items skip the network entirely and complete in milliseconds.

Rows	Providers	Typical wall-clock
50	gTTS only	~30 s
50	OpenAI TTS only	~40 s
50	OpenAI TTS + DALL·E 2 images	~3–5 min
500	gTTS only	~5 min
500	OpenAI TTS only	~7 min
500	OpenAI TTS + DALL·E 2 images	~30–50 min

Tips:

The SQLite cache is populated per-item, so an interrupted run resumes cheaply.
Image generation is by far the slowest step — run audio-only first and add images on a second pass.
Provider rate limits (not your bandwidth) usually cap throughput; expect diminishing returns from parallelism.

Cache

Anita keeps a small SQLite index of previously generated media so that re-running anita generate on the same CSV skips paid API calls. The index stores only filename mappings (source text → target text → image_fname, audio_fname). It does not store API keys, prompts, generated audio, image bytes, or any other content — the media files themselves live in the media/ directory you pass to the CLI.

Location

The database path is resolved by platformdirs.user_cache_dir("anita"):

OS	Default path
Linux	`~/.cache/anita/generated_cards.db`
macOS	`~/Library/Caches/anita/generated_cards.db`
Windows	`%LOCALAPPDATA%\anita\anita\Cache\generated_cards.db`

Respects XDG_CACHE_HOME on Linux.

Lifecycle and disk reconciliation

Anita checks each cached filename against media/ on every run. If you delete a file out of media/, the next anita generate will regenerate just that asset (audio and image are handled independently). Known-failed generations are remembered so a flaky provider doesn't get hammered on every retry.

Clearing the cache

Anita ships a cache subcommand group:

anita cache path                          # print the DB path
anita cache show                          # table of cached (source, target, audio?, image?)
anita cache show --json                   # machine-readable output
anita cache clear --yes                   # delete the DB (prompts without --yes)
anita cache prune --missing-media media/  # drop rows whose media files are gone

If you prefer manual cleanup, remove the file directly:

# Linux
rm ~/.cache/anita/generated_cards.db

# macOS
rm ~/Library/Caches/anita/generated_cards.db

# Windows
Remove-Item "$env:LOCALAPPDATA\anita\anita\Cache\generated_cards.db"

Or pass a project-local cache path when using the Python API:

from pathlib import Path
from anita.cache import MediaCache
cache = MediaCache(db_path=Path("./anita-cache.db"))

Known limitations

OpenAI TTS 4096-char cap. The tts-1 endpoint rejects any request longer than 4096 characters. Anita's rows are typically short words or phrases, so this almost never bites — but if you pass long example sentences, split them first.
DALL·E 2 availability. DALL·E 2 is deprecated for new OpenAI accounts and may be unavailable depending on when your account was created. Existing accounts can still generate images; new accounts should prefer Stability AI.
ElevenLabs free-tier caps. The free tier has strict monthly character limits that a single large deck can exhaust. Audit your remaining quota before launching a 500-row run.
AnkiWeb sync. genanki produces a valid .apkg, but AnkiWeb cloud sync still requires you to import the file via the desktop Anki client at least once. There is no direct .apkg → AnkiWeb upload path.
Unicode normalization. Source/target strings are cached verbatim. A word written in NFC on macOS and NFD on Linux will produce distinct cache keys and regenerate media. If you move a deck between platforms, consider pre-normalizing your CSV with unicodedata.normalize("NFC", ...).
Non-deterministic .apkg bytes. genanki embeds timestamps, so two identical runs produce different archive bytes. Tracked in #38.

Contributing

Contributions welcome! See CONTRIBUTING.md for dev setup, coding style, and PR conventions. By participating you agree to the Code of Conduct.

To report a security issue, please see SECURITY.md.

License

Supply chain

Anita takes a few concrete steps to be a well-behaved dependency:

PyPI OIDC trusted publishing — releases are uploaded from GitHub Actions without any long-lived API token.
Sigstore attestations — every wheel and sdist on PyPI is signed by pypa/gh-action-pypi-publish, so you can verify provenance.
CycloneDX SBOM — each GitHub Release ships an anita-v<version>-sbom.cdx.json bill of materials (CycloneDX 1.5) listing every locked dependency.
Secret scanning — gitleaks runs on every PR, plus GitHub-native push protection.
Dependency auditing — pip-audit scans uv.lock on every PR and weekly against OSV.dev.

See SECURITY.md for details and vulnerability reporting.

Acknowledgments

genanki — Anki deck construction.
OpenAI — TTS and image generation.
ElevenLabs — premium multilingual voices.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
anita		anita
examples		examples
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anita — AI-Powered Anki Deck Generator

Why Anita?

Table of contents

Features

Quickstart

Installation

From PyPI (recommended)

From source (development)

Usage

CLI

Python API

CSV format

Configuration

Cost estimate

Runtime estimates

Cache

Location

Lifecycle and disk reconciliation

Clearing the cache

Known limitations

Contributing

License

Supply chain

Acknowledgments

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Anita — AI-Powered Anki Deck Generator

Why Anita?

Table of contents

Features

Quickstart

Installation

From PyPI (recommended)

From source (development)

Usage

CLI

Python API

CSV format

Configuration

Cost estimate

Runtime estimates

Cache

Location

Lifecycle and disk reconciliation

Clearing the cache

Known limitations

Contributing

License

Supply chain

Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages