/alternative_sentences: bookmark t_token_i can be stale against returned context_tokenized

## Summary

The `/alternative_sentences/<user_word_id>` endpoint returns bookmarks where `t_token_i` does not always point to `bookmark.from` (i.e. the target word) inside the `context_tokenized` that's shipped alongside it. Any frontend that uses the position to highlight or restore the bookmark ends up attaching it to the *wrong* word in the sentence.

## How it surfaces

The web frontend recently moved exercise highlighting from regex-based string matching to the same `previousBookmarks` / `updateTokensWithBookmarks` pathway `ArticleReader` uses for past_bookmarks. That path looks up the target token by `(sent_i, token_i)` and attaches the bookmark there.

For the *current* exercise context, positions are accurate and the new path works great: the cloze word renders with the chip-above + dotted-orange highlight + tap-to-pronounce, consistent with reading view.

For *alternative-sentence* contexts (reached via the left/right chevron navigation, which calls `/alternative_sentences/<user_word_id>` under the hood), some bookmarks come back with `t_token_i` pointing to a different word than `bookmark.from`. The frontend then either:

- attaches the bookmark to the wrong word (e.g. user sees "mené" highlighted instead of "poussière"), or
- silently fails the restoration and the target word renders plain.

## Concrete examples observed

- bookmark.from = `poussière`, `t_total_token: 1`, `t_token_i: 12` — but tokens[12] in the returned `context_tokenized` is `mené`.
- bookmark.from = `Geheimnis`, `t_total_token: 1`, `t_token_i` (some value) — but the target token at that position isn't `Geheimnis`.

Front-end console diagnostic confirmed: after bookmark-restoration, the Word matching `findClozeWordIds(...)` ends up with `translation: null` and the chip never appears.

## Expected behavior

For every bookmark `b` returned by `/alternative_sentences`:

`context_tokenized[para][b.t_sentence_i][b.t_token_i ... b.t_token_i + b.t_total_token - 1]`

should concatenate (case-insensitively, modulo punctuation) to `b.from`.

## Suggested fix

In `generated_examples.py`, after generating/fetching the alternative example, recompute `t_token_i` (and `t_total_token`) against the *exact same* tokenization that gets serialized into `context_tokenized` for the response. The position computed against one tokenizer/run shouldn't be served alongside `context_tokenized` from another.

## Workaround in frontend (not great)

Detect failed restoration and fall back to a string-search lookup against `bookmark.from`. This papers over the data issue but makes every consumer reinvent the same fallback. Better to fix the data at the source.

## Impact

User-visible: when navigating with the chevrons in an exercise, some alternative contexts don't get the highlight + chip, looking inconsistent with the original context.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/alternative_sentences: bookmark t_token_i can be stale against returned context_tokenized #618

Summary

How it surfaces

Concrete examples observed

Expected behavior

Suggested fix

Workaround in frontend (not great)

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

/alternative_sentences: bookmark t_token_i can be stale against returned context_tokenized #618

Description

Summary

How it surfaces

Concrete examples observed

Expected behavior

Suggested fix

Workaround in frontend (not great)

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions