Skip to content

[Feature] Subsegment / fragment-level TM matching #193

@michaelbeijer

Description

@michaelbeijer

Summary

Improve TM lookup to handle the case where the TM contains full sentences but the source document is segmented into smaller fragments (or the reverse).

Example given by @danerx77:

  • TM entry: Which heading do you want to read?Który nagłówek chcesz przeczytać?
  • Source document segments: Which heading / do you want to read?
  • Desired matches: Który nagłówek for the first segment, chcesz przeczytać? for the second

This is sometimes called fragment matching, subsegment leveraging, or fragment recall in other CAT tools (Trados has AutoSuggest fragments, for example). It's distinct from concordance lookup — concordance is a manual search for substrings; this is automatic surfacing of partial alignments at TM-lookup time.

Implementation likely involves indexing TM entries at the word/phrase level and aligning on partial matches. Non-trivial – worth a design discussion before implementation.

Originally requested by @danerx77 in a comment on #189. Tracking here as a separate issue so it can be prioritised on its own.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions