Skip to content

fix(ansi): treat wide characters as breakable in Wrap and Wordwrap#827

Open
tzengyuxio wants to merge 2 commits into
charmbracelet:mainfrom
tzengyuxio:fix/cjk-wrap-breakable
Open

fix(ansi): treat wide characters as breakable in Wrap and Wordwrap#827
tzengyuxio wants to merge 2 commits into
charmbracelet:mainfrom
tzengyuxio:fix/cjk-wrap-breakable

Conversation

@tzengyuxio

Copy link
Copy Markdown

Summary

Problem

Previously, consecutive wide characters were accumulated into the word buffer and treated as a single long "word". This caused suboptimal wrapping for CJK text:

// width=36, input: "...這個 vault 不只是記事本,而是一個可瀏覽..."

// Before: "vault" alone on a line, CJK text hard-wrapped separately
模板、屬性與 Bases 結合後,這個
vault
不只是記事本,而是一個可瀏覽、可過濾
、可重組的個人知識資料庫。

// After: "vault" shares line with CJK, natural character-by-character wrapping
模板、屬性與 Bases 結合後,這個
vault 不只是記事本,而是一個可瀏覽、
可過濾、可重組的個人知識資料庫。

Fix

When a wide character (width > 1) is encountered, flush the current word buffer and write the character directly to output. This allows line breaks between any two wide characters, producing more natural wrapping for CJK text while preserving word boundaries for narrow (Latin) characters.

This fix benefits all consumers including glamour (via lipgloss.Wrap) and glow.

Related: charmbracelet/glamour#185

Test plan

  • All existing tests pass
  • Updated 2 test expectations to reflect improved CJK wrapping behavior
  • Verified with mixed CJK/Latin text at various widths

Wide characters (CJK ideographs, fullwidth chars) can be broken between
any two characters per Unicode line breaking rules (UAX charmbracelet#14). Previously,
consecutive wide characters were accumulated into the word buffer and
treated as a single long "word", causing suboptimal wrapping:

- Mixed CJK/Latin text like "vault 不只是記事本" would push the entire
  CJK sequence to the next line instead of filling the current line
- CJK text only got hard-wrapped when the word exceeded the limit,
  wasting space at the end of lines

Now wide characters (width > 1) are written directly to the output
buffer, allowing line breaks between any two wide characters. This
produces more natural wrapping for CJK text while preserving word
boundaries for narrow (Latin) characters.
…ping

Check whether the wide character fits on the current line (including
pending space width) before flushing the space buffer. This prevents
spaces from being written to the line end when the next character
triggers a line break.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant