Skip to content

Idea: raw text snippet windows as profiler context/double-check #36

@renaudcepre

Description

@renaudcepre

Idea (to explore)

The profiler currently only sees what the analyzer extracted (structured fragments). If the analyzer missed a subtle mention or miscategorized a role, the profiler never sees it.

Idea: for each character, scan raw source files for all occurrences of their name, extract ±200 chars of surrounding text, and pass those snippets as additional context to the profiler.

When

Likely best as a post-profiling double-check pass rather than upfront — the profiler already has a consolidated profile, the raw snippets are used to validate/enrich rather than flooding it with noisy raw text from the start.

Potential value

  • Catches what the structured pipeline lost
  • Cheap to compute (regex + slice)
  • Could surface missed relations, traits, background details

Open questions

  • How to handle overlapping windows (name appears multiple times in close proximity)?
  • Noise from stage directions mixed with dialogue
  • Same name potentially referring to different characters across eras

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions