Skip to content

Conversation

@ablaszkiewicz
Copy link
Contributor

@ablaszkiewicz ablaszkiewicz commented Feb 9, 2026

Our python SDK does code variables capture in error tracking. We have 15 default regex case insensitive matches for redacting some variables like password, api_key, etc...

As it turns out, regex is extremely slow compared to substring in string.

This PR optimizes case insensitive regexes into simple substring matching which is substantially faster.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +935 to +938
def _extract_plain_substring(pattern):
# Matches inline flag groups like (?i), (?ai), (?ims), etc. that include the 'i' flag.
# Python regex flags: a=ASCII, i=IGNORECASE, L=LOCALE, m=MULTILINE, s=DOTALL, u=UNICODE, x=VERBOSE
inline_flags = re.match(r"^\(\?[aiLmsux]*i[aiLmsux]*\)", pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substrings may be missed
_extract_plain_substring() rejects any pattern whose remainder contains regex metacharacters, including [ and ]. That means simple ignore-case patterns like (?i)[\s_-]*api[_-]?key (or even (?i)api[_-]?key) will never take the substring fast path and will still compile as regex, so the PR’s stated optimization won’t apply to a chunk of “simple” redaction patterns. If the default redaction patterns include any character classes / optional separators, consider broadening the fast-path detection to cover those cases (or update the PR description/tests to reflect which patterns are actually optimized).

Prompt To Fix With AI
This is a comment left during a code review.
Path: posthog/exception_utils.py
Line: 935:938

Comment:
**Substrings may be missed**
`_extract_plain_substring()` rejects any pattern whose remainder contains regex metacharacters, including `[` and `]`. That means simple ignore-case patterns like `(?i)[\s_-]*api[_-]?key` (or even `(?i)api[_-]?key`) will never take the substring fast path and will still compile as regex, so the PR’s stated optimization won’t apply to a chunk of “simple” redaction patterns. If the default redaction patterns include any character classes / optional separators, consider broadening the fast-path detection to cover those cases (or update the PR description/tests to reflect which patterns are actually optimized).

How can I resolve this? If you propose a fix, please make it concise.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

posthog-python Compliance Report

Date: 2026-02-09 22:22:34 UTC
Duration: 158864ms

✅ All Tests Passed!

29/29 tests passed


Capture Tests

29/29 tests passed

View Details
Test Status Duration
Format Validation.Event Has Required Fields 518ms
Format Validation.Event Has Uuid 1507ms
Format Validation.Event Has Lib Properties 1507ms
Format Validation.Distinct Id Is String 1506ms
Format Validation.Token Is Present 1507ms
Format Validation.Custom Properties Preserved 1506ms
Format Validation.Event Has Timestamp 1507ms
Retry Behavior.Retries On 503 9518ms
Retry Behavior.Does Not Retry On 400 3505ms
Retry Behavior.Does Not Retry On 401 3509ms
Retry Behavior.Respects Retry After Header 9514ms
Retry Behavior.Implements Backoff 23528ms
Retry Behavior.Retries On 500 7503ms
Retry Behavior.Retries On 502 7513ms
Retry Behavior.Retries On 504 7512ms
Retry Behavior.Max Retries Respected 23529ms
Deduplication.Generates Unique Uuids 1496ms
Deduplication.Preserves Uuid On Retry 7513ms
Deduplication.Preserves Uuid And Timestamp On Retry 14512ms
Deduplication.Preserves Uuid And Timestamp On Batch Retry 7017ms
Deduplication.No Duplicate Events In Batch 1502ms
Deduplication.Different Events Have Different Uuids 1507ms
Compression.Sends Gzip When Enabled 1506ms
Batch Format.Uses Proper Batch Structure 1506ms
Batch Format.Flush With No Events Sends Nothing 1005ms
Batch Format.Multiple Events Batched Together 1505ms
Error Handling.Does Not Retry On 403 3507ms
Error Handling.Does Not Retry On 413 3508ms
Error Handling.Retries On 408 7512ms

@ablaszkiewicz ablaszkiewicz merged commit ffb8e9b into master Feb 9, 2026
21 checks passed
@ablaszkiewicz ablaszkiewicz deleted the ab/fix/further-optimize-code-variables-regex-search branch February 9, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants