Skip to content

Fix extractPlainText() dropping content from Code, Math, and RawInline nodes#86

Merged
dereuromark merged 1 commit intomasterfrom
fix/heading-id-inline-code-math-content
Feb 26, 2026
Merged

Fix extractPlainText() dropping content from Code, Math, and RawInline nodes#86
dereuromark merged 1 commit intomasterfrom
fix/heading-id-inline-code-math-content

Conversation

@josbeir
Copy link
Contributor

@josbeir josbeir commented Feb 26, 2026

Problem

HeadingIdTracker::extractPlainText() recursively walks a heading's child nodes but only reads text from Text, SoftBreak, and HardBreak. Inline nodes that carry their value in a $content property — Code, Math, and RawInline — are hit by the generic Node fallback, recursed into, and find no Text children, so their content is silently dropped.

This affects TOC entry labels, auto-generated heading anchor IDs, and anything that calls getPlainText() on a heading.

Reproducer:

$extension = new TableOfContentsExtension();
$converter = new DjotConverter();
$converter->addExtension($extension);
$converter->convert("## The `foo` function\n");

var_dump($extension->getToc()[0]['text']);
// Expected: 'The foo function'
// Actual:   'The  function'

Fix

Add an explicit elseif branch for Code, Math, and RawInline before the generic Node fallback in extractPlainText():

} elseif ($child instanceof Code || $child instanceof Math || $child instanceof RawInline) {
    $text .= $child->getContent();
}

Changes

  • HeadingIdTracker.php — import Code, Math, RawInline; add the new branch in extractPlainText()
  • TableOfContentsExtensionTest.php — add testInlineCodeTextIsIncludedInTocEntry() and testInlineMathTextIsIncludedInTocEntry()

…e nodes

HeadingIdTracker::extractPlainText() recursed into Code, Math, and
RawInline nodes as generic Node children but never called getContent()
on them, so their text was silently dropped from TOC labels and
heading IDs.

Add explicit handling for these content-bearing inline nodes before the
generic Node fallback, so their  is appended to the plain-text
output.

Fixes: heading TOC entries and IDs for headings containing inline code
spans, math, or raw inline markup (e.g. '## The `foo` function' now
yields text 'The foo function').
Copilot AI review requested due to automatic review settings February 26, 2026 17:27
@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.86%. Comparing base (8e489e4) to head (67b5dbd).
⚠️ Report is 9 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##             master      #86   +/-   ##
=========================================
  Coverage     93.86%   93.86%           
- Complexity     2130     2133    +3     
=========================================
  Files            77       77           
  Lines          5704     5706    +2     
=========================================
+ Hits           5354     5356    +2     
  Misses          350      350           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dereuromark dereuromark merged commit c5811a3 into master Feb 26, 2026
10 checks passed
@dereuromark dereuromark deleted the fix/heading-id-inline-code-math-content branch February 26, 2026 17:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a bug in HeadingIdTracker::extractPlainText() where inline nodes containing text in a $content property (Code, Math, and RawInline) were being silently dropped during plain text extraction. This affected table of contents entry labels and auto-generated heading anchor IDs.

Changes:

  • Added explicit handling for Code, Math, and RawInline nodes in the extractPlainText() method
  • Added test cases for inline code and inline math in TOC entries

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/Renderer/HeadingIdTracker.php Added imports for Code, Math, and RawInline classes; added conditional branch to extract content from these node types
tests/TestCase/Extension/TableOfContentsExtensionTest.php Added two test methods verifying that inline code and inline math content is properly included in TOC entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

$text .= $child->getContent();
} elseif ($child instanceof SoftBreak || $child instanceof HardBreak) {
$text .= ' ';
} elseif ($child instanceof Code || $child instanceof Math || $child instanceof RawInline) {
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RawInline content should not be included in plain text extraction. The PlainTextRenderer (src/Renderer/PlainTextRenderer.php:139) explicitly skips RawInline nodes with the comment "Skip raw inlines (format-specific)". Raw inline content is format-specific (e.g., HTML, LaTeX) and should not be part of heading IDs or TOC entries. Consider removing RawInline from this condition and only including Code and Math.

Suggested change
} elseif ($child instanceof Code || $child instanceof Math || $child instanceof RawInline) {
} elseif ($child instanceof Code || $child instanceof Math) {

Copilot uses AI. Check for mistakes.
Comment on lines +139 to +140
} elseif ($child instanceof Code || $child instanceof Math || $child instanceof RawInline) {
$text .= $child->getContent();
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider whether Symbol and FootnoteRef inline nodes should also be handled in plain text extraction. The PlainTextRenderer renders Symbol as ':name:' and FootnoteRef as '[label]'. Symbol content (e.g., ':heart:' in a heading like '## I :heart: Djot') could be meaningful for heading IDs and TOC entries, though FootnoteRef references are probably better excluded. Review the PlainTextRenderer implementation at src/Renderer/PlainTextRenderer.php:135-136 for reference.

Copilot uses AI. Check for mistakes.
$toc = $tocExtension->getToc();

$this->assertSame('Equation E=mc^2', $toc[0]['text']);
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test coverage is incomplete - RawInline is included in the fix but there's no test case demonstrating its behavior. Given that RawInline content is format-specific and the PlainTextRenderer explicitly skips it, a test would help clarify the intended behavior. However, as noted in another comment, RawInline probably should not be included in the plain text extraction at all.

Suggested change
}
}
public function testRawInlineIsExcludedFromTocEntry(): void
{
$converter = new DjotConverter();
$tocExtension = new TableOfContentsExtension();
$converter->addExtension($tocExtension);
// Djot raw inline: {=html}<span>raw</span>
$converter->convert("## Hello {=html}<span>raw</span>\n");
$toc = $tocExtension->getToc();
// PlainTextRenderer skips RawInline content, so only "Hello" should remain
$this->assertSame('Hello', $toc[0]['text']);
$this->assertSame('Hello', $toc[0]['id']);
}

Copilot uses AI. Check for mistakes.

$toc = $tocExtension->getToc();

$this->assertSame('Equation E=mc^2', $toc[0]['text']);
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline math test should also verify the generated heading ID, similar to the inline code test on line 339. This would ensure that math content is properly included in both the TOC text and the auto-generated ID. The expected ID for 'Equation E=mc^2' would be 'Equation-E=mc^2' based on the normalizeId implementation which only removes '#' and replaces whitespace with dashes.

Suggested change
$this->assertSame('Equation E=mc^2', $toc[0]['text']);
$this->assertSame('Equation E=mc^2', $toc[0]['text']);
$this->assertSame('Equation-E=mc^2', $toc[0]['id']);

Copilot uses AI. Check for mistakes.
@dereuromark
Copy link
Contributor

what about those comments about inline?

@josbeir
Copy link
Contributor Author

josbeir commented Feb 27, 2026

what about those comments about inline?

Merged and released!

@dereuromark dereuromark added the bug Something isn't working label Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants