Skip to content

bug: PHP call graph drops file-scope (top-level) calls; include/require not mapped → ~0% coverage on procedural PHP #367

@atahan150

Description

@atahan150

What happened?

The PHP call-graph extractor misses all file-scope (top-level) calls, and include/require are not captured as dependencies. On procedural PHP codebases (where most code runs at file scope rather than inside functions/methods), this yields ~0% call-site coveragecalls edges and file→file dependency edges are almost entirely absent, even though the deterministic tree-sitter parse succeeds.

Root cause 1 — top-level calls dropped. In packages/core/src/plugins/extractors/php-extractor.ts, extractCallGraph() only records a call when it is lexically inside a function/method body:

// Extract call expressions
if (functionStack.length > 0) {        // <-- file-scope calls never recorded
  const caller = functionStack[functionStack.length - 1];
  ...push {caller, callee} ...
}

Any function_call_expression / member_call_expression / scoped_call_expression at file scope produces no entry. This is fine for OOP/modular code, but page-style procedural PHP (e.g. a controller/view file that calls helpers at the top level) ends up with an empty call graph.

Root cause 2 — include/require not mapped. extractStructure() maps imports only from namespace_use_declaration (PHP use). include / include_once / require / require_once — the actual dependency mechanism in non-namespaced/legacy PHP — are never turned into import/dependency edges. So file→file edges are also missing for these codebases.

The LLM file-analyzer layer does not recover this: per agents/file-analyzer.md (~line 145) it is told not to re-read source for code files, and it infers calls edges from the import map + neighbor symbols — both empty here.

Expected: file-scope calls represented in the call graph (e.g. with a synthetic file/module-scope caller), and include/require(_once) mapped to dependency edges.
Actual: 0 call edges and 0 include-based dependency edges for top-level procedural PHP files.

Minimal reproduction

Minimal PHP file (demo.php):

<?php
require_once __DIR__ . '/helpers.php';   // (A) not captured as a dependency edge

function helper() { return 1; }

helper();                                 // (B) top-level call — DROPPED

function caller() {
    helper();                             // (C) in-function call — captured
}

Run the bundled extractor (same engine file-analyzer uses):

import { TreeSitterPlugin, builtinLanguageConfigs } from '@understand-anything/core';
const p = new TreeSitterPlugin(builtinLanguageConfigs);
await p.init();
const src = fs.readFileSync('demo.php', 'utf8');
console.log(p.extractCallGraph('demo.php', src)); // -> only the (C) helper() call; (B) is missing
console.log(p.analyzeFile('demo.php', src).imports); // -> [] ; the require_once (A) is missing

Measured on a real procedural PHP CMS (~380 PHP files, ~84k lines), extractCallGraph per file vs grep ground-truth of call sites:

File type real call sites (grep) captured coverage
page file (top-level heavy) view/controller ~645 0 0%
another page file view/controller ~1218 0 0%
helpers file (function defs) functions ~959 737 ~77%
theme helpers file functions ~875 456 ~52%

For one heavily-used helper, 99/99 call sites in the two page files were invisible, while 57/58 in-function call sites were captured. So in-function extraction is healthy; the gap is specifically file-scope calls + include/require.

Suggested direction (optional)

  • In extractCallGraph, also record calls when functionStack is empty, attributing them to a synthetic file/module-scope caller (e.g. <file> or the module node), so impact analysis can answer "who calls X" for procedural code.
  • In the PHP extractor, recognize include / include_once / require / require_once expressions and emit import/dependency edges (resolving the path argument where statically determinable).

Happy to help test against the procedural codebase if useful.

Plugin version

2.7.5 (main @ HEAD, cloned 2026-06-02; tree-sitter-php@0.23.12)

Platform / client

Claude Code (CLI) — reproduced by invoking the bundled @understand-anything/core TreeSitterPlugin.extractCallGraph directly (the same path file-analyzer uses via extract-structure.mjs).

OS + Node version

Windows 11 (x64), Node v24.2.0, pnpm 10.33.4

Primary language of the analyzed project

PHP (legacy/procedural — non-namespaced, include/require-based)

Approximate file count

~380 PHP files (~84k lines)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions