Skip to content

Word-lesson audio cache ignores CEFR level → mixed-level word lessons #636

@mircealungu

Description

@mircealungu

Problem

A daily word lesson can mix CEFR levels across its word segments. Example from prod (lesson 863, a three_words_lesson): the three segments carry ['A1', 'A1', 'A2']. So an A1 learner can get a word whose example sentences were written for B2.

Root cause

AudioLessonMeaning.find() keys the cache on meaning + teacher_language only — difficulty_level is not part of the lookup:

# zeeguu/core/model/audio_lesson_meaning.py:69
@classmethod
def find(cls, meaning, teacher_language=None):
    """Find a non-deprecated audio lesson for a specific meaning and teacher language."""
    query = cls.query.filter_by(meaning=meaning).filter(cls.deprecated_at.is_(None))
    if teacher_language:
        query = query.filter_by(teacher_language_id=teacher_language.id)
    return query.first()

So the first row ever generated for a (meaning, teacher_language) pair wins and is reused for every later user, regardless of their level. The difficulty_level column is written at generation time but never consulted on lookup.

The meaning-audio script is genuinely level-dependent — the prompt says "the lesson is for somebody who is CEFR level {cefr_level} so ensure that sentences are of the appropriate difficulty." So a reused row really does carry the wrong difficulty for the new learner, not just a mislabeled tag.

Asymmetry worth noting

Dialogue lessons (topic/situation) do filter the cache by level — AudioLessonDialogue.find_unheard(...) includes difficulty_level=cefr_level. So dialogue lessons are level-correct; only meaning-audio reuse is level-blind. The fix would bring meaning-audio in line with how dialogues already work.

Why now

Surfaced while adding a per-lesson DailyAudioLesson.cefr_level() to the API responses (for the shared-lesson link preview). Because segments can disagree, that method currently reports the most common level across segments to mask the inconsistency — a workaround that would be unnecessary if word-audio were cached per level.

Options

  1. Cache per level — add difficulty_level to AudioLessonMeaning.find(), matching the dialogue behavior. Costs more generation / less cross-user reuse, but each learner gets level-appropriate examples.
  2. Reuse across adjacent levels only (e.g. share within A1–A2, B1–B2) to keep some reuse while bounding the mismatch.
  3. Accept & document — the headword/translation are identical regardless of level; only example-sentence complexity differs. If that's deemed acceptable, drop the per-segment level field from the model's mental model and treat lesson level as the user's generation level.

Acceptance

  • Decide the caching key for meaning-audio.
  • If changing: update AudioLessonMeaning.find() + the regeneration path in daily_lesson_generator.generate_audio_lesson_meaning(), and consider whether existing rows need backfill/deprecation.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions