Summary
Improve TM lookup to handle the case where the TM contains full sentences but the source document is segmented into smaller fragments (or the reverse).
Example given by @danerx77:
- TM entry:
Which heading do you want to read? → Który nagłówek chcesz przeczytać?
- Source document segments:
Which heading / do you want to read?
- Desired matches:
Który nagłówek for the first segment, chcesz przeczytać? for the second
This is sometimes called fragment matching, subsegment leveraging, or fragment recall in other CAT tools (Trados has AutoSuggest fragments, for example). It's distinct from concordance lookup — concordance is a manual search for substrings; this is automatic surfacing of partial alignments at TM-lookup time.
Implementation likely involves indexing TM entries at the word/phrase level and aligning on partial matches. Non-trivial – worth a design discussion before implementation.
Originally requested by @danerx77 in a comment on #189. Tracking here as a separate issue so it can be prioritised on its own.
Summary
Improve TM lookup to handle the case where the TM contains full sentences but the source document is segmented into smaller fragments (or the reverse).
Example given by @danerx77:
Which heading do you want to read?→Który nagłówek chcesz przeczytać?Which heading/do you want to read?Który nagłówekfor the first segment,chcesz przeczytać?for the secondThis is sometimes called fragment matching, subsegment leveraging, or fragment recall in other CAT tools (Trados has AutoSuggest fragments, for example). It's distinct from concordance lookup — concordance is a manual search for substrings; this is automatic surfacing of partial alignments at TM-lookup time.
Implementation likely involves indexing TM entries at the word/phrase level and aligning on partial matches. Non-trivial – worth a design discussion before implementation.
Originally requested by @danerx77 in a comment on #189. Tracking here as a separate issue so it can be prioritised on its own.