API reference

from python_source_leak_filter import strip_python_source

`strip_python_source(text) -> (str, bool)`

Strip Python-looking content from text. Returns (filtered_text, was_stripped).

text: the string to filter (typically a language model's full text reply).
returns (filtered_text, was_stripped):
- filtered_text — text with every detected block replaced by an "implementation detail withheld" placeholder.
- was_stripped — True iff any pass removed content (use it as your audit hook).

Empty string and whitespace-only input return (text, False) and never raise.

The function runs three independent passes in order:

Pass	Detects	Mechanism
1	```python and ```py fenced blocks	regex (`re.DOTALL \| re.IGNORECASE`)
2	bare ``` fenced blocks whose body looks like Python	regex + `_looks_like_python()` gate
3	unfenced `def` / `async def` / `class` declarations + their indented body	line scan, any leading indent

A generic fence body is treated as Python if either:

it contains ≥ 2 of these signals (checked over the first 2000 chars): def , class , async def, import , from , self., return , raise , yield , lambda , if __name__; or
it contains base64.b64decode / base64.decodebytes and exec( / eval( — the base64-exec smuggling vector.

This selectivity is what keeps SQL, JSON, and prose fenced blocks from being stripped.

The unfenced pass (3) removes the declaration line plus every subsequent line indented more than it, so it correctly captures nested methods and stops at the next sibling line.
Surrounding prose outside a stripped block is preserved verbatim.
All replacements use a single placeholder string (_LEAK_REPLACEMENT in filter.py).