Skip to content

timpara/anita

Anita — AI-Powered Anki Deck Generator

CI Coverage License: Apache 2.0 Python Ruff uv

Turn a plain CSV of word pairs into a rich, multimedia Anki deck with AI-generated native-like pronunciations and optional illustrations — in minutes, for any language pair.

Why Anita?

Language learners lose hours building decent flashcards by hand. Anita automates the tedious part — generating TTS audio (OpenAI or ElevenLabs) and optional DALL·E images — so you can focus on reviewing, not curating. Feed it a CSV, get back a .apkg you can import straight into Anki on desktop or mobile.

Table of contents

Features

  • CSV in, .apkg out — point it at a two-column CSV and get a ready-to-import Anki deck.
  • Pluggable TTS — OpenAI tts-1 by default, ElevenLabs multilingual v2 optional.
  • Optional illustrations — DALL·E 2 images auto-resized to 128×128 px for clean cards.
  • Local media cache — every generated asset is cached in a SQLite DB so repeat runs are free and fast. See Cache for location and lifecycle.
  • Language-agnostic — works for any source → target language pair.
  • Clean card template — distraction-free front/back with audio playback and image.

Quickstart

# Install
uv tool install anita-anki  # or: pipx install anita-anki

# Set credentials (or copy `.env.example` → `.env` and edit)
export OPENAI_API_KEY=sk-...
# Optional:
export ELEVENLABS_API_KEY=...

# Generate
anita generate examples/basics.csv my_deck.apkg --deck-name "My Vocabulary"

Import my_deck.apkg into Anki and start reviewing.

Installation

From PyPI (recommended)

uv tool install anita-anki
# or
pipx install anita-anki
# or
pip install anita-anki

The distribution is published as anita-anki on PyPI (the name anita was taken), but the import name and CLI are both anita.

From source (development)

git clone https://github.com/timpara/anita.git
cd anita
uv sync --all-extras
uv run anita --help

Usage

CLI

anita generate INPUT.csv OUTPUT.apkg [OPTIONS]

Common options:

Flag Default Description
--deck-name Anita Vocabulary Deck name shown inside Anki.
--tts openai TTS provider: openai or elevenlabs.
--images / --no-images --no-images Generate DALL·E illustrations per card.
--voice-id (elevenlabs preset) ElevenLabs voice ID.
--verbose false Enable debug logging.

Run anita generate --help for the full list.

Python API

from anita import AnkiDeckGenerator

generator = AnkiDeckGenerator(
    deck_name="Italian Restaurant",
    tts_provider="elevenlabs",
    generate_images=True,
)
generator.generate_deck("examples/restaurant.csv", "restaurant.apkg")

CSV format

Two columns: source word (prompt side) and target word (answer side). Header row is optional and auto-detected.

apple,mela
house,casa
book,libro
water,acqua

Working examples live in examples/.

Configuration

API keys are read from environment variables. A .env file in the working directory is auto-loaded if present — the fastest way to get started is:

cp .env.example .env
# then edit .env with your real keys

.env is git-ignored; never commit it.

Variable Required for
OPENAI_API_KEY OpenAI TTS, DALL·E
ELEVENLABS_API_KEY ElevenLabs TTS (optional)

Generated media is cached under your OS user-cache directory (via platformdirs) so re-running on the same words incurs zero API cost.

Cost estimate

Service Use case Model Approximate cost
OpenAI TTS tts-1 $0.015 / 1k characters
OpenAI Image generation DALL·E 2 $0.020 / image (256×256)
ElevenLabs Premium TTS v2 Per your subscription tier

A 500-word deck with audio-only (OpenAI) typically costs well under $0.50.

Runtime estimates

These are rough wall-clock figures for a fresh run (no cache hits) on a 100 Mbps connection. The dominant factor is the per-item round-trip latency to the provider API; CPU and disk are negligible. Cached items skip the network entirely and complete in milliseconds.

Rows Providers Typical wall-clock
50 gTTS only ~30 s
50 OpenAI TTS only ~40 s
50 OpenAI TTS + DALL·E 2 images ~3–5 min
500 gTTS only ~5 min
500 OpenAI TTS only ~7 min
500 OpenAI TTS + DALL·E 2 images ~30–50 min

Tips:

  • The SQLite cache is populated per-item, so an interrupted run resumes cheaply.
  • Image generation is by far the slowest step — run audio-only first and add images on a second pass.
  • Provider rate limits (not your bandwidth) usually cap throughput; expect diminishing returns from parallelism.

Cache

Anita keeps a small SQLite index of previously generated media so that re-running anita generate on the same CSV skips paid API calls. The index stores only filename mappings (source text → target text → image_fname, audio_fname). It does not store API keys, prompts, generated audio, image bytes, or any other content — the media files themselves live in the media/ directory you pass to the CLI.

Location

The database path is resolved by platformdirs.user_cache_dir("anita"):

OS Default path
Linux ~/.cache/anita/generated_cards.db
macOS ~/Library/Caches/anita/generated_cards.db
Windows %LOCALAPPDATA%\anita\anita\Cache\generated_cards.db

Respects XDG_CACHE_HOME on Linux.

Lifecycle and disk reconciliation

Anita checks each cached filename against media/ on every run. If you delete a file out of media/, the next anita generate will regenerate just that asset (audio and image are handled independently). Known-failed generations are remembered so a flaky provider doesn't get hammered on every retry.

Clearing the cache

Anita ships a cache subcommand group:

anita cache path                          # print the DB path
anita cache show                          # table of cached (source, target, audio?, image?)
anita cache show --json                   # machine-readable output
anita cache clear --yes                   # delete the DB (prompts without --yes)
anita cache prune --missing-media media/  # drop rows whose media files are gone

If you prefer manual cleanup, remove the file directly:

# Linux
rm ~/.cache/anita/generated_cards.db

# macOS
rm ~/Library/Caches/anita/generated_cards.db
# Windows
Remove-Item "$env:LOCALAPPDATA\anita\anita\Cache\generated_cards.db"

Or pass a project-local cache path when using the Python API:

from pathlib import Path
from anita.cache import MediaCache
cache = MediaCache(db_path=Path("./anita-cache.db"))

Known limitations

  • OpenAI TTS 4096-char cap. The tts-1 endpoint rejects any request longer than 4096 characters. Anita's rows are typically short words or phrases, so this almost never bites — but if you pass long example sentences, split them first.
  • DALL·E 2 availability. DALL·E 2 is deprecated for new OpenAI accounts and may be unavailable depending on when your account was created. Existing accounts can still generate images; new accounts should prefer Stability AI.
  • ElevenLabs free-tier caps. The free tier has strict monthly character limits that a single large deck can exhaust. Audit your remaining quota before launching a 500-row run.
  • AnkiWeb sync. genanki produces a valid .apkg, but AnkiWeb cloud sync still requires you to import the file via the desktop Anki client at least once. There is no direct .apkg → AnkiWeb upload path.
  • Unicode normalization. Source/target strings are cached verbatim. A word written in NFC on macOS and NFD on Linux will produce distinct cache keys and regenerate media. If you move a deck between platforms, consider pre-normalizing your CSV with unicodedata.normalize("NFC", ...).
  • Non-deterministic .apkg bytes. genanki embeds timestamps, so two identical runs produce different archive bytes. Tracked in #38.

Contributing

Contributions welcome! See CONTRIBUTING.md for dev setup, coding style, and PR conventions. By participating you agree to the Code of Conduct.

To report a security issue, please see SECURITY.md.

License

Apache License 2.0 © 2024–present Anita contributors.

Supply chain

Anita takes a few concrete steps to be a well-behaved dependency:

  • PyPI OIDC trusted publishing — releases are uploaded from GitHub Actions without any long-lived API token.
  • Sigstore attestations — every wheel and sdist on PyPI is signed by pypa/gh-action-pypi-publish, so you can verify provenance.
  • CycloneDX SBOM — each GitHub Release ships an anita-v<version>-sbom.cdx.json bill of materials (CycloneDX 1.5) listing every locked dependency.
  • Secret scanninggitleaks runs on every PR, plus GitHub-native push protection.
  • Dependency auditingpip-audit scans uv.lock on every PR and weekly against OSV.dev.

See SECURITY.md for details and vulnerability reporting.

Acknowledgments

  • genanki — Anki deck construction.
  • OpenAI — TTS and image generation.
  • ElevenLabs — premium multilingual voices.

About

AI-powered Anki deck generator: turn a CSV of word pairs into a multimedia .apkg with TTS (OpenAI/ElevenLabs) and optional DALL-E illustrations.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors