Skip to content

fix: use absolute XPath to prevent hang on large HWP files#240

Open
kkoey-kim wants to merge 1 commit into
mete0r:masterfrom
kkoey-kim:fix/odt-xslt-performance
Open

fix: use absolute XPath to prevent hang on large HWP files#240
kkoey-kim wants to merge 1 commit into
mete0r:masterfrom
kkoey-kim:fix/odt-xslt-performance

Conversation

@kkoey-kim
Copy link
Copy Markdown

Problem

ODT conversion via hwp5odt hangs indefinitely on large HWP files (e.g., 120 chapters, 18,332 paragraphs).

Root cause: In common.xsl, five XPath lookups use //Style[...], //CharShape[...], and //ParaShape[...] which perform full document tree scans from the root for every paragraph. With 18,332 paragraphs × 5 lookups = ~90,000 full tree traversals on a 14.7MB XML intermediate, the XSLT processor never completes.

Fix

Replace descendant-or-self (//) lookups with absolute paths (/HwpDoc/DocInfo/IdMappings/...), which navigate directly to the target nodes.

This is consistent with other lookups already in the same file (lines 226, 234, 354, 876, 879, 912, etc.) that already use absolute paths for the same node types.

Changes

src/hwp5/xsl/odt/common.xsl — 5 lines changed:

Line Before After
717 //Style[$style-id] /HwpDoc/DocInfo/IdMappings/Style[$style-id]
753 //CharShape[$charshape-id] /HwpDoc/DocInfo/IdMappings/CharShape[$charshape-id]
761 //Style[$style-id] /HwpDoc/DocInfo/IdMappings/Style[$style-id]
764 //ParaShape[$parashape-id] /HwpDoc/DocInfo/IdMappings/ParaShape[$parashape-id]
847 //ParaShape[$parashape-id] /HwpDoc/DocInfo/IdMappings/ParaShape[$parashape-id]

Results

Before After
120-chapter HWP (1MB, 18,332 paragraphs) Hangs indefinitely (>10 min) 147 seconds
Output Never produced 19.3MB ODT, valid DOCX via LibreOffice

The output is identical since the same nodes are selected — only the lookup strategy changes.

The ODT conversion hangs indefinitely on large HWP files (e.g., 120 chapters,
18,332 paragraphs) because `//Style[...]`, `//CharShape[...]`, and
`//ParaShape[...]` perform full document tree scans from the root for each
paragraph. With 18,332 paragraphs × 5 lookups = ~90,000 full tree traversals
on a 14.7MB XML intermediate, the XSLT processor never completes.

Replace descendant-or-self (`//`) lookups with absolute paths
(`/HwpDoc/DocInfo/IdMappings/...`) which navigate directly to the target
nodes. This is consistent with other lookups in the same file (lines 226,
234, 354, etc.) that already use absolute paths.

Before: hangs indefinitely (tested >10 minutes, never completes)
After: completes in ~147 seconds for 120-chapter HWP file

The output is identical since the same nodes are selected — only the
lookup strategy changes from O(n) tree scan to O(1) direct path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant