from python_source_leak_filter import strip_python_sourceStrip Python-looking content from text. Returns (filtered_text, was_stripped).
text: the string to filter (typically a language model's full text reply).- returns
(filtered_text, was_stripped):filtered_text—textwith every detected block replaced by an "implementation detail withheld" placeholder.was_stripped—Trueiff any pass removed content (use it as your audit hook).
Empty string and whitespace-only input return (text, False) and never raise.
The function runs three independent passes in order:
| Pass | Detects | Mechanism |
|---|---|---|
| 1 | ```python and ```py fenced blocks |
regex (re.DOTALL | re.IGNORECASE) |
| 2 | bare ``` fenced blocks whose body looks like Python |
regex + _looks_like_python() gate |
| 3 | unfenced def / async def / class declarations + their indented body |
line scan, any leading indent |
A generic fence body is treated as Python if either:
- it contains ≥ 2 of these signals (checked over the first 2000 chars):
def,class,async def,import,from,self.,return,raise,yield,lambda,if __name__; or - it contains
base64.b64decode/base64.decodebytesandexec(/eval(— the base64-exec smuggling vector.
This selectivity is what keeps SQL, JSON, and prose fenced blocks from being stripped.
- The unfenced pass (3) removes the declaration line plus every subsequent line indented more than it, so it correctly captures nested methods and stops at the next sibling line.
- Surrounding prose outside a stripped block is preserved verbatim.
- All replacements use a single placeholder string (
_LEAK_REPLACEMENTinfilter.py).