Skip to content

Add additional comment escaping patterns#162

Closed
chasefleming wants to merge 1 commit into
mainfrom
cf/comment-escaping
Closed

Add additional comment escaping patterns#162
chasefleming wants to merge 1 commit into
mainfrom
cf/comment-escaping

Conversation

@chasefleming

Copy link
Copy Markdown
Owner
  • Added missing comment escaping patterns:
    • -- (double dashes anywhere) → --
    • \x00 (null characters) → �
    • < (in comments) → &lt;

@chasefleming

Copy link
Copy Markdown
Owner Author

@whisk Do you mind reviewing this for me since you're familiar with it? I added in some additional items from the spec

@chasefleming chasefleming changed the title Add additional comment escaping Add additional comment escaping patterns Jul 25, 2025
@whisk

whisk commented Jul 25, 2025

Copy link
Copy Markdown
Contributor

I've checked some comment variations with HTML5 validator and the older W3 validator.

TL;DR

-- should be escaped.
There is no real need to escape < even for older versions of HTML (in case someone uses the package to generate pieces of html for older doctypes).
There is no need to escape null characters in comments, but it might be usefull to escape them everywhere.

More details

Double dashes
Double dashes are allowed in HTML5 but raise a warning for XML:

<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>Elem Page</title></head>
<body><p>Welcome to Elem!</p></body>
<!-- comment with -- in it --> 
</html>

Warning: The document is not mappable to XML 1.0 due to two consecutive hyphens in a comment.

HTML 4.01 Strict also doesn't like --.

The < character

< is allowed in HTML5 by the specs, and this document is valid:

<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>Elem Page</title></head>
<body><p>Welcome to Elem!</p></body>
<!-- comment with < in it --> 
</html>

XHTML 1.0 Strict and HTML 4.01 Strict also find < valid.

Null character

I believe null characters are explicitly not allowed in HTML, let alone comments: https://html.spec.whatwg.org/multipage/parsing.html#data-state

U+0000 NULL
This is an unexpected-null-character parse error. Emit the current input character as a character token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants