Skip to content

WIP: YPE-1824 - add Bible HTML transformer to core package#198

Open
cameronapak wants to merge 9 commits intomainfrom
ype-1814-react-sdk-create-bible-html-transformer-h
Open

WIP: YPE-1824 - add Bible HTML transformer to core package#198
cameronapak wants to merge 9 commits intomainfrom
ype-1814-react-sdk-create-bible-html-transformer-h

Conversation

@cameronapak
Copy link
Collaborator

@cameronapak cameronapak commented Mar 20, 2026

Move transformBibleHtml from UI to @youversion/platform-core as a
runtime-agnostic function with injected DOM adapters. UI wrapper now
delegates to core, keeping getFootnoteMarker and font constants.

Refs: YPE-1814
@changeset-bot
Copy link

changeset-bot bot commented Mar 20, 2026

⚠️ No Changeset found

Latest commit: 6308fca

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@cameronapak cameronapak changed the title feat: add Bible HTML transformer to core package YPE-1824 - add Bible HTML transformer to core package Mar 20, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 20, 2026

Greptile Summary

This PR moves Bible HTML transformation logic from the @youversion/platform-react-ui package into a new bible-html-transformer.ts module in @youversion/platform-core, making it runtime-agnostic. The transformer is now injected with DOM adapter callbacks (parseHtml/serializeHtml) so it can run in browsers (via native DOMParser), Node.js (via linkedom), and future runtimes. The verse.tsx component is refactored to source footnote data from embedded data-verse-footnote-content attributes instead of a separately-computed notes map, and isomorphic-dompurify is removed from the UI package.

Key concerns to address before merging:

  • require('linkedom') in an ESM packagepackages/core has "type": "module", so require is not available in module scope. transformBibleHtmlForNode will throw ReferenceError at runtime in any ESM-first environment (Next.js App Router, Remix, Astro SSR). Replace with await import('linkedom') but note this requires making the function async and updating all callers and tests.
  • ASCII apostrophe missing from NEEDS_SPACE_BEFORE regex' (U+0027) is absent from the negated character class in bible-html-transformer.ts:5, causing a spurious space to be inserted before possessives and contractions (e.g. God[fn]'sGod [fn] 's).
  • XSS sanitization removed with no replacementisomorphic-dompurify is dropped and all four XSS security tests are deleted. dangerouslySetInnerHTML in VerseFootnoteButton now renders completely unsanitized HTML, and getVerseHtmlFromDom outputs raw clone.innerHTML into the same slot. The safety assumption "comes from our YouVersion APIs" is operational, not technical, and leaves consumers exposed if any API-side sanitization ever lapses.
  • getVerseHtmlFromDom O(N²) DOM cloning — called once per anchor in useLayoutEffect rather than once per unique verse, causing redundant querySelectorAll + cloneNode pairs for footnote-heavy passages.
  • linkedom shipped as a hard runtime dependency — adds ~500 KB to every consumer bundle even for browser-only use cases; consider optionalDependencies or a peer dep strategy.
  • No changeset included — this PR introduces breaking API changes (TransformedBibleHtml no longer has notes/rawHtml, VerseNotes type removed) that require a changeset entry per the monorepo's versioning policy.

Confidence Score: 2/5

  • Not safe to merge — a runtime-breaking require() in an ESM package, a regex bug that corrupts possessive text, and the removal of XSS sanitization without replacement must all be resolved first.
  • The architectural direction is sound and the test coverage is reasonable, but three correctness/security issues exist that would break the feature in production: (1) transformBibleHtmlForNode uses require() in an ESM package and will throw in Next.js App Router / Remix / Astro SSR environments, (2) the NEEDS_SPACE_BEFORE regex inserts spurious spaces before possessives and contractions, and (3) isomorphic-dompurify was removed with no replacement sanitization and all XSS tests were deleted. The PR is also marked WIP and is missing a changeset.
  • packages/core/src/bible-html-transformer.ts (ESM require + regex bug), packages/ui/src/components/verse.tsx (unsanitized dangerouslySetInnerHTML + O(N²) DOM cloning), packages/ui/src/components/verse.test.tsx (deleted XSS tests).

Important Files Changed

Filename Overview
packages/core/src/bible-html-transformer.ts New 317-line runtime-agnostic HTML transformer. Has require('linkedom') ESM incompatibility at line 306, missing ASCII apostrophe in NEEDS_SPACE_BEFORE regex at line 5, and silent unsanitized fallback in transformBibleHtmlForBrowser.
packages/core/src/bible-html-transformer.node.test.ts New Node-environment test suite using @vitest-environment node. Tests call transformBibleHtmlForNode synchronously, which will break if the function is changed to async to fix the ESM require() issue.
packages/core/src/bible-html-transformer.test.ts New jsdom test suite covering verse wrapping, footnote extraction, table fixing, and nbsp insertion. Good coverage of happy-path scenarios; no negative-path or malformed-HTML tests.
packages/ui/src/components/verse.tsx Major refactor removing DOMPurify sanitization and the notes data-flow through props; footnote data now sourced from data-verse-footnote-content attributes. getVerseHtmlFromDom outputs unsanitized clone.innerHTML into dangerouslySetInnerHTML, and is called once per anchor instead of once per unique verse (O(N²) clone loop).
packages/ui/src/components/verse.test.tsx Removes 4 XSS security tests (script injection, onerror/onclick event handlers, javascript: URLs) with no replacement. Security regression surface is now untested.
packages/ui/src/lib/verse-html-utils.ts Reduced from 413 lines to 3 — all transformation logic moved to @youversion/platform-core. Only font constants remain; the file is kept to avoid breaking the FontFamily import in verse.tsx.
packages/core/package.json Adds linkedom ^0.18.12 as a runtime dependency. This ships in the published package, adding ~500 KB to consumers who only need browser usage. Consider whether linkedom belongs in optionalDependencies or as a peer dependency instead.
packages/ui/package.json Removes isomorphic-dompurify 2.23.0. Clean dependency removal; versions still aligned at 1.20.2 with packages/core.

Sequence Diagram

sequenceDiagram
    participant App
    participant Verse.Html
    participant transformBibleHtmlForBrowser
    participant BibleTextHtml
    participant getVerseHtmlFromDom
    participant VerseFootnoteButton

    App->>Verse.Html: html (raw API HTML)
    Verse.Html->>transformBibleHtmlForBrowser: html
    Note over transformBibleHtmlForBrowser: wrapVerseContent()<br/>assignFootnoteKeys()<br/>replaceFootnotesWithAnchors()<br/>addNbspToVerseLabels()<br/>fixIrregularTables()
    transformBibleHtmlForBrowser-->>Verse.Html: { html: transformedHtml }
    Verse.Html->>BibleTextHtml: html=transformedHtml

    Note over BibleTextHtml: useLayoutEffect sets innerHTML<br/>queries [data-verse-footnote]

    BibleTextHtml->>BibleTextHtml: 1st pass: build notesByKey map<br/>(reads data-verse-footnote-content)
    BibleTextHtml->>getVerseHtmlFromDom: verseNum (once per anchor, not per verse)
    getVerseHtmlFromDom-->>BibleTextHtml: cloned+sanitized verse HTML
    BibleTextHtml->>BibleTextHtml: setFootnoteData([...])

    loop For each footnote anchor
        BibleTextHtml->>VerseFootnoteButton: portal into anchor element<br/>notes[], verseHtml, hasVerseContext
        VerseFootnoteButton-->>App: Popover with footnote content
    end
Loading

Comments Outside Diff (1)

  1. packages/ui/src/components/verse.test.tsx, line 841 (link)

    P1 XSS security tests removed without replacement

    Four security tests were deleted in this diff:

    • should remove script tags from HTML
    • should remove inline event handlers (onerror)
    • should remove inline event handlers (onclick)
    • should remove javascript: URLs

    These tests documented the XSS protection contract that Verse.Html provides to consumers. Removing them together with isomorphic-dompurify means there is now no automated verification that injected <script>, event handler attributes, or javascript: URLs are rendered harmlessly. Since dangerouslySetInnerHTML is still used in VerseFootnoteButton, the surface remains — the safety net has simply been removed.

    Even if the trust model is "the API is our own and is safe", regression tests for the XSS cases help catch future regressions (e.g. if API-side sanitization is accidentally disabled or a third-party translation introduces unexpected markup).

Last reviewed commit: "Refactor: Move getFo..."

Comment on lines +266 to +273
async getPassage(
versionId: number,
usfm: string,
format: 'html',
include_headings: boolean | undefined,
include_notes: boolean | undefined,
transform: TransformBibleHtmlOptions,
): Promise<TransformedBiblePassage>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Overload forces explicit undefined for optional booleans

The second overload signature requires callers to spell out undefined for include_headings and include_notes whenever they want to use transform:

// Only way to reach the TransformedBiblePassage overload:
await bibleClient.getPassage(id, usfm, 'html', undefined, undefined, transform);

This is ergonomically awkward. An options-object signature for the transform variant — or at minimum making include_headings / include_notes truly optional with ? — would improve the call-site experience. This could also be addressed by accepting transform as part of an options bag (e.g. { transform, include_headings, include_notes }) so callers don't need positional undefined placeholders.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Adds a new attribute to store the footnote content directly on the
element,
improving accessibility and enabling easier styling of footnote content.
Adds `transformBibleHtmlForNode` to the core package, utilizing Linkedom
for DOM manipulation in Node.js environments. This allows for consistent
Bible HTML processing across server and browser.

Also refactors `transformBibleHtml` to accept DOM adapters, enabling
runtime-agnostic transformations. The browser-specific transformation is
now handled by `transformBibleHtmlForBrowser`.

The `isomorphic-dompurify` dependency is removed from
`@youversion/platform-ui`, as sanitization is no longer handled at the
UI layer but is now implicitly managed by the core transformer.
Comment on lines +252 to +265
it('should preserve footnote HTML structure in data-verse-footnote-content', () => {
const html = `
<div>
<div class="p">
<span class="yv-v" v="1"></span><span class="yv-vlbl">1</span>Text<span class="yv-n f"><span class="ft"><em>Emphasized</em> note</span></span>.</div>
</div>
`;

const result = transformBibleHtml(html, createAdapters());

const doc = new DOMParser().parseFromString(result.html, 'text/html');
const anchor = doc.querySelector('[data-verse-footnote="1"]');
expect(anchor).not.toBeNull();
const content = anchor!.getAttribute('data-verse-footnote-content');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test name does not match what is actually asserted

The test is titled 'should not call sanitize when omitted', but the assertion checks that parseHtml was called (expect(called).toBe(true)). The test does not actually verify that the sanitize function is bypassed when omitted; it only confirms the pipeline still executes. A call to a non-existent sanitize would throw an error and also cause the test to pass (the error would surface before the assertion), so the title can mislead readers into thinking this is a no-sanitize-path coverage test.

Consider renaming the test and adding an explicit guard or spy:

Suggested change
it('should preserve footnote HTML structure in data-verse-footnote-content', () => {
const html = `
<div>
<div class="p">
<span class="yv-v" v="1"></span><span class="yv-vlbl">1</span>Text<span class="yv-n f"><span class="ft"><em>Emphasized</em> note</span></span>.</div>
</div>
`;
const result = transformBibleHtml(html, createAdapters());
const doc = new DOMParser().parseFromString(result.html, 'text/html');
const anchor = doc.querySelector('[data-verse-footnote="1"]');
expect(anchor).not.toBeNull();
const content = anchor!.getAttribute('data-verse-footnote-content');
it('should still run parseHtml when sanitize is not provided', () => {
const html = '<div>Test</div>';
let called = false;
transformBibleHtml(html, {
parseHtml: (h) => {
called = true;
return new DOMParser().parseFromString(h, 'text/html');
},
serializeHtml: (doc) => doc.body.innerHTML,
});
expect(called).toBe(true);
});

Copy link
Collaborator Author

@cameronapak cameronapak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 5 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +294 to +297
const transformedData = useMemo<TransformedBibleHtml>(
() => transformBibleHtmlForBrowser(html),
[html],
);
Copy link
Collaborator Author

@cameronapak cameronapak Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HTML sanitization (DOMPurify) removed — unsanitized HTML rendered via dangerouslySetInnerHTML

This PR removes isomorphic-dompurify from packages/ui/package.json and eliminates all HTML sanitization from the transformation pipeline. The old transformBibleHtml in verse-html-utils.ts ran DOMPurify.sanitize(html, DOMPURIFY_CONFIG) before any processing. The new transformBibleHtmlForBrowser in core performs no sanitization at all.

The unsanitized HTML is then rendered at three injection points in verse.tsx:

  • contentRef.current.innerHTML = html (line 178)
  • dangerouslySetInnerHTML={{ __html: verseHtml }} (line 106)
  • dangerouslySetInnerHTML={{ __html: note }} (line 120)

This violates the repository's explicit review guideline at docs/review-guidelines.md:19: "Is XSS protection properly implemented (no dangerouslySetInnerHTML without sanitization)?"

Additionally, the biome-ignore comments on lines 105 and 119 still claim "HTML has been run through DOMPurify and is safe" — this is now factually incorrect since DOMPurify was removed. The XSS protection tests (script tag removal, event handler stripping, javascript: URL removal) were also deleted from verse.test.tsx.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Check for the existence of `globalThis.DOMParser` before attempting to
use it.
This prevents errors in environments where the DOMParser might not be
available.
This commit introduces linkedom as a dependency for the
`@youversion/platform-core` package, enabling the
`transformBibleHtmlForNode` function to correctly process and transform
Bible HTML in Node.js environments.

The `transformBibleHtmlForNode` function now correctly wraps HTML in
`<html><body>` tags before parsing to ensure proper serialization of
`doc.body.innerHTML`.

Dependency updates include adding `linkedom` and upgrading `htmlparser2`
and `entities` to their latest compatible versions. Test cases for
`bible-html-transformer.node.test.ts` have been adjusted to accommodate
potential attribute ordering differences in `linkedom`'s serialization
and to correctly assert the encoded non-breaking space.
The regular expression for determining if a space is needed before a
character was too restrictive and missed cases where apostrophes should
be preceded by a space in the transformed HTML. This commit expands the
character set to include single quotes, ensuring proper rendering.
Ensure that the bible-html-transformer does not crash if a footnote
element is missing its key attribute.
greptile-apps[bot]

This comment was marked as resolved.

The `transformBibleHtml` function and its variants have been refactored
to embed footnote data directly within the HTML as `data-verse-footnote`
and `data-verse-footnote-content` attributes. This eliminates the need
for the separate `notes` property in the return type, simplifying the
API and making the output HTML self-contained.

Tests have been updated to reflect these changes, focusing on the
presence and content of the new data attributes in the transformed HTML.
The `VerseNotes` type has been removed from the core package.
@cameronapak cameronapak changed the title YPE-1824 - add Bible HTML transformer to core package WIP: YPE-1824 - add Bible HTML transformer to core package Mar 20, 2026
Comment on lines +306 to +317
export function transformBibleHtmlForNode(html: string): TransformedBibleHtml {
// eslint-disable-next-line @typescript-eslint/no-require-imports
const { DOMParser } = require('linkedom') as {
DOMParser: new () => { parseFromString(html: string, type: string): Document };
};

return transformBibleHtml(html, {
parseHtml: (h: string) =>
new DOMParser().parseFromString(`<html><body>${h}</body></html>`, 'text/html'),
serializeHtml: (doc: Document) => doc.body.innerHTML,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 require() throws ReferenceError in ESM environments

packages/core/package.json declares "type": "module", making this package native ESM. When any bundler or Node.js runtime resolves the "import" export condition, require is not defined in module scope and calling transformBibleHtmlForNode will throw:

ReferenceError: require is not defined in ES module scope

This will affect Next.js App Router (uses the "import" condition on the server), Remix, Astro SSR, and any ESM-first environment.

Replace the synchronous require() with a dynamic import to make it ESM-compatible:

export async function transformBibleHtmlForNode(html: string): Promise<TransformedBibleHtml> {
  const { DOMParser } = await import('linkedom');
  return transformBibleHtml(html, {
    parseHtml: (h: string) =>
      new DOMParser().parseFromString(`<html><body>${h}</body></html>`, 'text/html'),
    serializeHtml: (doc: Document) => doc.body.innerHTML,
  });
}

Note this also changes the return type to Promise<TransformedBibleHtml>, which will require callers to await the result — a breaking change from the current synchronous API.

The node test (bible-html-transformer.node.test.ts) also needs to be updated to await the call.

Copy link
Collaborator Author

@cameronapak cameronapak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment on lines +286 to +295
export function transformBibleHtmlForBrowser(html: string): TransformedBibleHtml {
if (typeof globalThis.DOMParser === 'undefined') {
return { html };
}

return transformBibleHtml(html, {
parseHtml: (h) => new DOMParser().parseFromString(h, 'text/html'),
serializeHtml: (doc) => doc.body.innerHTML,
});
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Core package exports browser-only DOMParser API, violating package boundary rules

transformBibleHtmlForBrowser directly uses the browser-only globalThis.DOMParser (line 287) and new DOMParser() (line 292). The core package's AGENTS.md explicitly states: "❌ Don't: Import React, window, document, or browser-only APIs directly". DOMParser is a browser-only API that doesn't exist in Node.js. While the function guards with a runtime check, the rule prohibits having browser APIs in core at all.

Similarly, transformBibleHtmlForNode at line 308 uses require('linkedom'), which is a Node.js-specific CJS API in a package with "type": "module". The core package should only export the generic transformBibleHtml with its adapter pattern; the environment-specific convenience functions belong in the respective consuming packages (UI for browser, a server package for Node).

Prompt for agents
Move `transformBibleHtmlForBrowser` out of packages/core/src/bible-html-transformer.ts and into the UI package (e.g., packages/ui/src/lib/verse-html-utils.ts or a new file), since it uses browser-only DOMParser. Similarly, move `transformBibleHtmlForNode` (lines 306-317) to a server-side package or keep it as documentation/example only. The core package should only export the generic `transformBibleHtml` function and its types (`TransformBibleHtmlOptions`, `TransformedBibleHtml`). Update packages/core/src/index.ts to stop exporting the environment-specific functions, and update packages/ui/src/components/verse.tsx to import from the local wrapper instead of @youversion/platform-core.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

type VerseNotes,
} from '@/lib/verse-html-utils';
import { type FontFamily, getFootnoteMarker } from '@/lib/verse-html-utils';
import { transformBibleHtmlForBrowser } from '@youversion/platform-core';
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 UI component imports directly from @youversion/platform-core instead of using local re-exports

verse.tsx:20 imports transformBibleHtmlForBrowser directly from @youversion/platform-core. The UI package's AGENTS.md explicitly states: "❌ Don't: Import from @youversion/platform-core directly (except re-exports in index.ts)". No other component file in packages/ui/src/components/ imports from @youversion/platform-core — this is the only instance, breaking the established pattern. The import should go through a local re-export or utility file.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

The `getFootnoteMarker` function was moved from `verse-html-utils.ts` to
`verse.tsx`. This is because the logic is specifically related to
rendering footnote markers within the Verse component and does not have
broader utility for the entire UI package. This change adheres to the
package boundary guidelines by keeping related logic localized.
Comment on lines +56 to +77
function getVerseHtmlFromDom(container: HTMLElement, verseNum: string): string {
const wrappers = container.querySelectorAll(`.yv-v[v="${verseNum}"]`);
if (!wrappers.length) return '';

const parts: string[] = [];
let noteIdx = 0;

wrappers.forEach((wrapper, i) => {
if (i > 0) parts.push(' ');
const clone = wrapper.cloneNode(true) as Element;
clone.querySelectorAll('.yv-h, .yv-vlbl').forEach((el) => el.remove());
clone.querySelectorAll('[data-verse-footnote]').forEach((anchor) => {
const sup = document.createElement('sup');
sup.className = 'yv:text-muted-foreground';
sup.textContent = getFootnoteMarker(noteIdx++);
anchor.replaceWith(sup);
});
parts.push(clone.innerHTML);
});

return parts.join('');
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 getVerseHtmlFromDom outputs unsanitized clone.innerHTML into dangerouslySetInnerHTML

getVerseHtmlFromDom clones verse wrapper nodes from the live DOM and returns their raw innerHTML as a string. That string is then set via dangerouslySetInnerHTML={{ __html: verseHtml }} (line 120). The clone preserves any HTML attributes and child content exactly as they appear in the DOM — including any injected markup if the API response ever contained unexpected HTML.

More broadly, this PR removes isomorphic-dompurify and deletes all four XSS security tests (should remove script tags, should remove inline event handlers (onerror), should remove inline event handlers (onclick), should remove javascript: URLs). The new justification comment "comes from our YouVersion APIs and is safe" is an operational assumption, not a technical guarantee. If a Bible translation licensed from a third-party publisher contained crafted footnote content, or if the API were misconfigured, this path would execute arbitrary HTML in the user's browser.

Consider either re-introducing a lightweight sanitization pass (e.g. strip event-handler attributes from the cloned tree before serializing) or adding a note acknowledging this risk has been deliberately accepted, so it is not silently re-reported by security scanners.

Comment on lines +207 to 216
// Second pass: create one entry per anchor (each anchor gets its own portal)
const result: VerseFootnoteData[] = [];
anchors.forEach((el) => {
const verseNum = el.getAttribute('data-verse-footnote');
if (verseNum) result.push({ verseNum, el });
if (!verseNum) return;
const allNotes = notesByKey.get(verseNum) || [];
const hasVerseContext = el.closest('.yv-v[v]') !== null;
const verseHtml = hasVerseContext ? getVerseHtmlFromDom(contentRef.current!, verseNum) : '';
result.push({ verseNum, el, notes: allNotes, verseHtml, hasVerseContext });
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 getVerseHtmlFromDom called once per anchor instead of once per verse

In the second pass, getVerseHtmlFromDom is invoked for every footnote anchor that has a verse context. If verse 1 has three footnotes, three separate querySelectorAll + cloneNode + querySelectorAll operations are performed to build the identical verseHtml string. For footnote-dense passages (e.g. Psalms) this creates an O(N²) DOM clone loop inside a useLayoutEffect.

The verseHtml depends only on verseNum, so it can be computed once per unique key using the notesByKey map that's already available from the first pass:

// After the first pass, compute verseHtml per unique key
const verseHtmlByKey = new Map<string, string>();
notesByKey.forEach((_, key) => {
  const sample = contentRef.current!.querySelector(`[data-verse-footnote="${key}"]`);
  const hasContext = sample?.closest('.yv-v[v]') !== null;
  verseHtmlByKey.set(key, hasContext ? getVerseHtmlFromDom(contentRef.current!, key) : '');
});

// Then in the second pass:
const verseHtml = verseHtmlByKey.get(verseNum) ?? '';

Comment on lines +306 to +317
export function transformBibleHtmlForNode(html: string): TransformedBibleHtml {
// eslint-disable-next-line @typescript-eslint/no-require-imports
const { DOMParser } = require('linkedom') as {
DOMParser: new () => { parseFromString(html: string, type: string): Document };
};

return transformBibleHtml(html, {
parseHtml: (h: string) =>
new DOMParser().parseFromString(`<html><body>${h}</body></html>`, 'text/html'),
serializeHtml: (doc: Document) => doc.body.innerHTML,
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 transformBibleHtmlForNode signature incompatible with future ESM-safe fix

The previous thread flagged that require('linkedom') throws ReferenceError in ESM environments. The correct fix is to replace it with await import('linkedom'), but that would force transformBibleHtmlForNode to return Promise<TransformedBibleHtml> instead of TransformedBibleHtml. Every call site in the node tests — and any user code — calls the function synchronously:

const result = transformBibleHtmlForNode(html); // currently sync

Changing to async without a major-version bump would be a silent breaking change. This is a good moment to decide on the intended contract before the API is published. Options:

  1. Ship the function as async now and update all callers/tests to await the result.
  2. Keep it synchronous but document that it only supports CJS environments and add a runtime check for typeof require.
  3. Accept a DOMParser-like factory as a parameter (matching the transformBibleHtml pattern) so the function stays sync and callers provide the linkedom parser.


const FOOTNOTE_KEY_ATTR = 'data-footnote-key';

const NEEDS_SPACE_BEFORE = /^[^\s.,;:!?)}\]'"»›]/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 NEEDS_SPACE_BEFORE missing ASCII apostrophe causes spurious space before possessives

The regex character class excludes ' (U+2019 RIGHT SINGLE QUOTATION MARK) and " (U+201C LEFT DOUBLE QUOTATION MARK), but the straight ASCII apostrophe ' (U+0027) is absent from the negated list. Any text node whose first character is a plain apostrophe — possessives like 's, contractions like 'll, 've — will match the regex and set nextNeedsSpace = true, inserting a spurious space after the footnote anchor.

For example God[fn]'s word would become God [fn] 's word instead of God[fn]'s word.

Suggested change
const NEEDS_SPACE_BEFORE = /^[^\s.,;:!?)}\]'"»]/;
const NEEDS_SPACE_BEFORE = /^[^\s.,;:!?)}\]'"'»]/;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant