Add AI bot classification for event enrichment#80
Open
jaredmixpanel wants to merge 5 commits intomasterfrom
Open
Add AI bot classification for event enrichment#80jaredmixpanel wants to merge 5 commits intomasterfrom
jaredmixpanel wants to merge 5 commits intomasterfrom
Conversation
Unit tests for AiBotClassifier and integration tests for Mixpanel::track() bot detection enrichment. Part of AI bot classification feature for PHP SDK.
Add BotClassifier_AiBotDatabase with 12 AI bot patterns and BotClassifier_AiBotClassifier for user-agent classification. Modify Mixpanel::track() to enrich events with bot classification properties when bot_detection is enabled. Part of AI bot classification feature for PHP SDK.
Add ConsumerStrategies_BotClassifyingConsumer that wraps any consumer and enriches events with AI bot classification at persist time. Part of AI bot classification feature for PHP SDK.
Update composer.json, phpunit.xml.dist, and all existing test files to use PHPUnit 9 (PHPUnit\Framework\TestCase, void return types). Part of AI bot classification feature for PHP SDK.
There was a problem hiding this comment.
Pull request overview
This PR adds AI bot classification functionality to the Mixpanel PHP library, enabling automatic detection of 12 known AI crawler user-agents and enriching tracked events with classification metadata. It also upgrades the testing infrastructure from PHPUnit 5.6 to PHPUnit 9.6.
Changes:
- Adds AI bot classification system with support for 12 AI bots (GPTBot, ClaudeBot, PerplexityBot, etc.) and optional custom patterns
- Upgrades test infrastructure to PHPUnit 9.6 with modernized test syntax (setUp/tearDown return type declarations)
- Provides two integration approaches: opt-in
bot_detectionflag or BotClassifyingConsumer wrapper
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| composer.json | Upgrades PHPUnit from 5.6 to 9.6 |
| phpunit.xml.dist | Removes deprecated PHPUnit configuration options |
| lib/Base/MixpanelBase.php | Adds bot_detection and bot_additional_patterns configuration options |
| lib/Mixpanel.php | Integrates bot classifier, adds getQueue() method, enriches track() with bot classification |
| lib/BotClassifier/AiBotDatabase.php | Defines database of 12 AI bot patterns with metadata |
| lib/BotClassifier/AiBotClassifier.php | Implements classification logic with custom pattern support |
| lib/ConsumerStrategies/BotClassifyingConsumer.php | Provides alternative consumer wrapper approach for bot classification |
| test/BotClassifier/AiBotClassifierTest.php | Comprehensive unit tests for classifier (all 12 bots, edge cases, custom patterns) |
| test/BotClassifier/BotClassifyingIntegrationTest.php | Integration tests for Mixpanel class with bot detection enabled |
| test/Base/MixpanelBaseProducerTest.php | Migrates to PHPUnit 9 syntax |
| test/ConsumerStrategies/*.php | Migrates all consumer test classes to PHPUnit 9 syntax |
| test/Producers/*.php | Migrates all producer test classes to PHPUnit 9 syntax |
| test/MixpanelTest.php | Migrates main test class to PHPUnit 9 syntax |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add bot property assertions in integration test (via flush + file content check) - Add double-classification guard when using BotClassifyingConsumer - Add invalid regex handling for custom patterns
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds AI bot classification integrated into the Mixpanel class that automatically detects AI crawler requests and enriches tracked events with classification properties.
What it does
$is_ai_bot,$ai_bot_name,$ai_bot_provider, and$ai_bot_categorypropertiesAI Bots Detected
GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, Google-Extended, PerplexityBot, Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent, cohere-ai
Implementation Details
Architecture
bot_detection => trueflag onMixpanel::getInstance()(enriches intrack()) andBotClassifyingConsumerstrategy_isUsingBotClassifyingConsumer()check intrack())'/GPTBot\//i')@preg_matchPublic API
BotClassifier_AiBotClassifier::__construct($additional_bots = array())BotClassifier_AiBotClassifier::classify($user_agent)['$is_ai_bot' => bool, ...]BotClassifier_AiBotClassifier::createClassifier($options = array())additional_botskey from$optionsBotClassifier_AiBotClassifier::getBotDatabase()BotClassifier_AiBotDatabase::getDatabase()BotClassifier_AiBotDatabase::getDatabaseForInspection()ConsumerStrategies_BotClassifyingConsumerpersist()Mixpanel::__construct($token, $options)bot_detectionandbot_additional_patternsoptionsMixpanel::track($event, $properties)bot_detectionis enabled and$user_agentis presentNotable Design Decisions
bot_detection => true) classifies attrack()time before enqueueing, whileBotClassifyingConsumerclassifies atpersist()time on the batch. This lets users choose early enrichment (properties visible in queue) vs. deferred enrichment (consumer-level).Mixpanel::track()calls_isUsingBotClassifyingConsumer()to detect when theBotClassifyingConsumeris already configured as the consumer strategy, preventing the same event from being classified twice.array_merge($additional_bots, getDatabase())puts custom patterns first so they are checked before built-in patterns, allowing users to override or extend classification.Usage Examples
Flag-Based Detection
Consumer Strategy
Standalone Classification
Custom Bot Patterns
Files Added
lib/BotClassifier/AiBotClassifier.phplib/BotClassifier/AiBotDatabase.phplib/ConsumerStrategies/BotClassifyingConsumer.phptest/BotClassifier/AiBotClassifierTest.phptest/BotClassifier/BotClassifyingIntegrationTest.phpFiles Modified
composer.jsonlib/Base/MixpanelBase.phplib/Mixpanel.phpphpunit.xml.disttest/Base/MixpanelBaseProducerTest.phptest/ConsumerStrategies/AbstractConsumerTest.phptest/ConsumerStrategies/CurlConsumerTest.phptest/ConsumerStrategies/FileConsumerTest.phptest/ConsumerStrategies/SocketConsumerTest.phptest/MixpanelTest.phptest/Producers/MixpanelEventsProducerTest.phptest/Producers/MixpanelGroupsProducerTest.phptest/Producers/MixpanelPeopleProducerTest.phpTest Plan
$is_ai_bot: false(Chrome, Googlebot, curl, etc.)