Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,24 @@ validate_domain(c("valid.com", "invalid..domain"))
- Build-time backend selection (`libidn2` when present, built-in fallback otherwise)
- Best-effort structured host extraction where invalid inputs are returned as missing components

## Non-goals

`punycoder` is a standards primitive for Punycode and host normalization. It is
deliberately agnostic about resolvability and safety; the following are **not**
part of its acceptance criteria:

- **No spoof / homograph / mixed-script / display-safety detection.**
`host_normalize()` is not a safety gate — a successful result says the host is
valid and normalized under the pinned UTS #46 profile, nothing about whether it
is visually safe or non-deceptive. Confusable and restriction-level checks
(UTS #39 / UTR #36, which UTS #46 itself recommends only as application/UI-layer
steps) belong upstack.
- **No URL canonicalization.** The `url_*` / `parse_url()` helpers do best-effort
host rewriting only (see above), not RFC 3986 / WHATWG URL parsing.
- **No DNS resolvability or registrability / PSL classification.**

These opinions belong in higher layers that consume punycoder's host functions.

## Acknowledgments

- Core C++/R integration is powered by `Rcpp`.
Expand Down
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,26 @@ validate_domain(c("valid.com", "invalid..domain"))
- Best-effort structured host extraction where invalid inputs are
returned as missing components

## Non-goals

`punycoder` is a standards primitive for Punycode and host
normalization. It is deliberately agnostic about resolvability and
safety; the following are **not** part of its acceptance criteria:

- **No spoof / homograph / mixed-script / display-safety detection.**
`host_normalize()` is not a safety gate — a successful result says the
host is valid and normalized under the pinned UTS \#46 profile,
nothing about whether it is visually safe or non-deceptive. Confusable
and restriction-level checks (UTS \#39 / UTR \#36, which UTS \#46
itself recommends only as application/UI-layer steps) belong upstack.
- **No URL canonicalization.** The `url_*` / `parse_url()` helpers do
best-effort host rewriting only (see above), not RFC 3986 / WHATWG URL
parsing.
- **No DNS resolvability or registrability / PSL classification.**

These opinions belong in higher layers that consume punycoder’s host
functions.

## Acknowledgments

- Core C++/R integration is powered by `Rcpp`.
Expand Down
22 changes: 21 additions & 1 deletion docs/normalization-contract.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,27 @@ Out of scope (the caller's concern, e.g. `pslr`):
- IP-address-literal detection and rejection — under STD3 rules `1.2.3.4`
normalizes to `1.2.3.4`; rejecting IP literals is the caller's job;
- the policy for what to *do* with an invalid element (return `NA` vs. abort) —
this function reports invalidity; the caller chooses the policy.
this function reports invalidity; the caller chooses the policy;
- **spoof / homograph / mixed-script / display-safety detection** — see the
Non-goals section below.

### Non-goals: spoofing and display safety (normative)

`host_normalize()` is **not a safety gate.** Confusable, homograph,
mixed-script, and display-safety detection — the concerns of **UTS #39**
(Unicode Security Mechanisms) and **UTR #36** (Unicode Security
Considerations) — are explicitly **not part of this function's acceptance
criteria.** A successful (non-`NA`) result asserts only that the host is valid
and normalized under the pinned UTS #46 profile; it asserts **nothing** about
whether the host is visually safe, non-deceptive, or distinguishable from
another host.

This is a deliberate scope boundary, not an oversight. UTS #46 §6 itself
*recommends* applying UTR #36 / UTS #39 confusable and restriction-level checks
as additional **application/UI-layer** steps on top of normalization — which is
precisely the argument for placing them upstack (in `rurl` or a dedicated
policy layer), not inside the normalization primitive. "Not part of the
acceptance criteria" means not in punycoder, not "never relevant".

## 2. Signature

Expand Down
Loading