From e33b351b43ae99a077c51e51cd63f55a2e945acd Mon Sep 17 00:00:00 2001 From: Bart Turczynski <142225707+bart-turczynski@users.noreply.github.com> Date: Tue, 16 Jun 2026 21:09:21 +0200 Subject: [PATCH] docs: document UTS #39 confusables/homograph as explicit non-goal (PUNY-mjdrwxne) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit host_normalize() is not a safety gate. Add a normative Non-goals section to the normalization contract and a Non-goals section to the README stating that spoof / homograph / mixed-script / display-safety detection (UTS #39 / UTR #36) is explicitly not part of punycoder's acceptance criteria — "not part of the criteria", not "never relevant". Cross-reference UTS #46's own recommendation to apply those checks as application/UI-layer steps upstack. No behaviour change. Co-Authored-By: Claude Opus 4.8 --- README.Rmd | 18 ++++++++++++++++++ README.md | 20 ++++++++++++++++++++ docs/normalization-contract.md | 22 +++++++++++++++++++++- 3 files changed, 59 insertions(+), 1 deletion(-) diff --git a/README.Rmd b/README.Rmd index fc7cad6..b03a377 100644 --- a/README.Rmd +++ b/README.Rmd @@ -151,6 +151,24 @@ validate_domain(c("valid.com", "invalid..domain")) - Build-time backend selection (`libidn2` when present, built-in fallback otherwise) - Best-effort structured host extraction where invalid inputs are returned as missing components +## Non-goals + +`punycoder` is a standards primitive for Punycode and host normalization. It is +deliberately agnostic about resolvability and safety; the following are **not** +part of its acceptance criteria: + +- **No spoof / homograph / mixed-script / display-safety detection.** + `host_normalize()` is not a safety gate — a successful result says the host is + valid and normalized under the pinned UTS #46 profile, nothing about whether it + is visually safe or non-deceptive. Confusable and restriction-level checks + (UTS #39 / UTR #36, which UTS #46 itself recommends only as application/UI-layer + steps) belong upstack. +- **No URL canonicalization.** The `url_*` / `parse_url()` helpers do best-effort + host rewriting only (see above), not RFC 3986 / WHATWG URL parsing. +- **No DNS resolvability or registrability / PSL classification.** + +These opinions belong in higher layers that consume punycoder's host functions. + ## Acknowledgments - Core C++/R integration is powered by `Rcpp`. diff --git a/README.md b/README.md index 3d4f91c..25f87c4 100644 --- a/README.md +++ b/README.md @@ -166,6 +166,26 @@ validate_domain(c("valid.com", "invalid..domain")) - Best-effort structured host extraction where invalid inputs are returned as missing components +## Non-goals + +`punycoder` is a standards primitive for Punycode and host +normalization. It is deliberately agnostic about resolvability and +safety; the following are **not** part of its acceptance criteria: + +- **No spoof / homograph / mixed-script / display-safety detection.** + `host_normalize()` is not a safety gate — a successful result says the + host is valid and normalized under the pinned UTS \#46 profile, + nothing about whether it is visually safe or non-deceptive. Confusable + and restriction-level checks (UTS \#39 / UTR \#36, which UTS \#46 + itself recommends only as application/UI-layer steps) belong upstack. +- **No URL canonicalization.** The `url_*` / `parse_url()` helpers do + best-effort host rewriting only (see above), not RFC 3986 / WHATWG URL + parsing. +- **No DNS resolvability or registrability / PSL classification.** + +These opinions belong in higher layers that consume punycoder’s host +functions. + ## Acknowledgments - Core C++/R integration is powered by `Rcpp`. diff --git a/docs/normalization-contract.md b/docs/normalization-contract.md index 922767a..195b167 100644 --- a/docs/normalization-contract.md +++ b/docs/normalization-contract.md @@ -49,7 +49,27 @@ Out of scope (the caller's concern, e.g. `pslr`): - IP-address-literal detection and rejection — under STD3 rules `1.2.3.4` normalizes to `1.2.3.4`; rejecting IP literals is the caller's job; - the policy for what to *do* with an invalid element (return `NA` vs. abort) — - this function reports invalidity; the caller chooses the policy. + this function reports invalidity; the caller chooses the policy; +- **spoof / homograph / mixed-script / display-safety detection** — see the + Non-goals section below. + +### Non-goals: spoofing and display safety (normative) + +`host_normalize()` is **not a safety gate.** Confusable, homograph, +mixed-script, and display-safety detection — the concerns of **UTS #39** +(Unicode Security Mechanisms) and **UTR #36** (Unicode Security +Considerations) — are explicitly **not part of this function's acceptance +criteria.** A successful (non-`NA`) result asserts only that the host is valid +and normalized under the pinned UTS #46 profile; it asserts **nothing** about +whether the host is visually safe, non-deceptive, or distinguishable from +another host. + +This is a deliberate scope boundary, not an oversight. UTS #46 §6 itself +*recommends* applying UTR #36 / UTS #39 confusable and restriction-level checks +as additional **application/UI-layer** steps on top of normalization — which is +precisely the argument for placing them upstack (in `rurl` or a dedicated +policy layer), not inside the normalization primitive. "Not part of the +acceptance criteria" means not in punycoder, not "never relevant". ## 2. Signature