Skip to content

Bug: Complete paragraph text rewrite causes delete+reinsert instead of surgical replace #56

@sripathikrishnan

Description

@sripathikrishnan

Parent Epic

#44

Bug

When a paragraph's text is completely rewritten (no shared tokens), the content alignment layer's Jaccard similarity drops below 0.3 and the paragraphs are NOT matched. Instead of an in-place surgical replace, the reconciler emits a full deleteContentRange for the old paragraph followed by insertText for the new one.

Root Cause

Content alignment in content_align.py uses token-based Jaccard similarity to decide if two paragraphs are "matchable". When similarity < 0.3, the paragraphs are treated as unrelated — one is deleted, the other inserted.

Impact

  • Any paragraph style, formatting, or properties attached to the original paragraph are lost
  • The operation is a delete+reinsert pattern (which the user wants to avoid)
  • Works correctly when paragraphs share >= 30% of tokens

Example

  • Base: "The quick brown fox jumps\n" → 5 tokens
  • Desired: "A lazy dog sleeps quietly here\n" → 6 tokens, 0 overlap
  • Result: delete entire paragraph + insert new one (not surgical)

Suggested Fix

Consider always matching paragraphs at the same position (positional fallback) even when Jaccard is low, especially when the document has a 1:1 paragraph correspondence. Alternatively, lower the threshold or use a secondary matching heuristic.

xfail Tests (1)

  • TestCompleteTextRewrite::test_completely_different_text_is_surgical

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions