fix: use absolute XPath to prevent hang on large HWP files#240
Open
kkoey-kim wants to merge 1 commit into
Open
fix: use absolute XPath to prevent hang on large HWP files#240kkoey-kim wants to merge 1 commit into
kkoey-kim wants to merge 1 commit into
Conversation
The ODT conversion hangs indefinitely on large HWP files (e.g., 120 chapters, 18,332 paragraphs) because `//Style[...]`, `//CharShape[...]`, and `//ParaShape[...]` perform full document tree scans from the root for each paragraph. With 18,332 paragraphs × 5 lookups = ~90,000 full tree traversals on a 14.7MB XML intermediate, the XSLT processor never completes. Replace descendant-or-self (`//`) lookups with absolute paths (`/HwpDoc/DocInfo/IdMappings/...`) which navigate directly to the target nodes. This is consistent with other lookups in the same file (lines 226, 234, 354, etc.) that already use absolute paths. Before: hangs indefinitely (tested >10 minutes, never completes) After: completes in ~147 seconds for 120-chapter HWP file The output is identical since the same nodes are selected — only the lookup strategy changes from O(n) tree scan to O(1) direct path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
ODT conversion via
hwp5odthangs indefinitely on large HWP files (e.g., 120 chapters, 18,332 paragraphs).Root cause: In
common.xsl, five XPath lookups use//Style[...],//CharShape[...], and//ParaShape[...]which perform full document tree scans from the root for every paragraph. With 18,332 paragraphs × 5 lookups = ~90,000 full tree traversals on a 14.7MB XML intermediate, the XSLT processor never completes.Fix
Replace descendant-or-self (
//) lookups with absolute paths (/HwpDoc/DocInfo/IdMappings/...), which navigate directly to the target nodes.This is consistent with other lookups already in the same file (lines 226, 234, 354, 876, 879, 912, etc.) that already use absolute paths for the same node types.
Changes
src/hwp5/xsl/odt/common.xsl— 5 lines changed://Style[$style-id]/HwpDoc/DocInfo/IdMappings/Style[$style-id]//CharShape[$charshape-id]/HwpDoc/DocInfo/IdMappings/CharShape[$charshape-id]//Style[$style-id]/HwpDoc/DocInfo/IdMappings/Style[$style-id]//ParaShape[$parashape-id]/HwpDoc/DocInfo/IdMappings/ParaShape[$parashape-id]//ParaShape[$parashape-id]/HwpDoc/DocInfo/IdMappings/ParaShape[$parashape-id]Results
The output is identical since the same nodes are selected — only the lookup strategy changes.