-
Notifications
You must be signed in to change notification settings - Fork 0
ARCHITECTURE_FULL
Version: 2.0.0
Based on: Real PATAS Core v2 code
PATAS Core is a generic engine for:
- Analyzing large message corpora (spam / not_spam)
- Automatically discovering spam patterns
- Generating machine-readable blocking rules
- Evaluating rules on real traffic
- Promoting good rules and deactivating bad ones
Key Principle: PATAS Core is generic and reusable. Works with abstract domain models and can be wrapped by different profiles (Telegram on-prem, public API, etc.) without modification.
Normalized message storage from external logs or CSV imports.
Поля:
-
id- Internal ID (Integer, primary key) -
external_id- External message ID (String, unique, for idempotency) -
timestamp- Message timestamp (DateTime, timezone-aware, indexed) -
text- Message text content (Text, required) -
meta- JSON metadata (JSON, optional): channel, language, country, sender, source, etc. -
is_spam- Optional spam label (Boolean, optional, indexed) -
tas_action- TAS action (String, optional, indexed): 'blocked' / 'allowed' -
user_complaint- User-reported spam (Boolean, default: False, indexed) -
unbanned- Whether message/user was unbanned (Boolean, default: False) -
created_at- Creation timestamp (DateTime, auto)
Индексы:
-
ix_messages_timestamp_spam(timestamp, is_spam) -
ix_messages_tas_action(tas_action)
Discovered spam patterns.
Поля:
-
id- Pattern ID (Integer, primary key) -
type- Pattern type (PatternType enum, required, indexed):-
URL- URL patterns -
PHONE- Phone number patterns -
TEXT- Text patterns -
META- Metadata patterns -
SIGNATURE- Message signature patterns -
KEYWORD- Keyword patterns
-
-
description- Human-readable description (Text, required) -
examples- Representative message texts (JSON array, optional) -
created_at- Creation timestamp (DateTime, auto) -
updated_at- Update timestamp (DateTime, auto)
Relations:
-
rules- One-to-many with Rule
SQL blocking rules with lifecycle management.
Поля:
-
id- Rule ID (Integer, primary key) -
pattern_id- Associated pattern (Integer, FK, optional, indexed) -
sql_expression- Safe SELECT query (Text, required) -
status- Lifecycle state (RuleStatus enum, required, indexed):-
CANDIDATE- New rule created by pattern mining -
SHADOW- Rule in shadow evaluation -
ACTIVE- Active rule, ready for export -
DEPRECATED- Deprecated rule
-
-
origin- Origin (String, required, default: 'llm'): 'llm', 'pattern_mining', 'manual' -
created_at- Creation timestamp (DateTime, auto) -
updated_at- Update timestamp (DateTime, auto)
Индексы:
-
ix_rules_status_updated(status, updated_at)
Relations:
-
pattern- Many-to-one with Pattern -
evaluations- One-to-many with RuleEvaluation
Rule evaluation metrics.
Поля:
-
id- Evaluation ID (Integer, primary key) -
rule_id- Associated rule (Integer, FK, required, indexed) -
time_period_start- Evaluation window start (DateTime, required) -
time_period_end- Evaluation window end (DateTime, required) -
hits_total- Total messages matched (Integer, required) -
spam_hits- Spam messages matched (Integer, required) -
ham_hits- Non-spam messages matched (Integer, required) -
precision- spam_hits / hits_total (Float, optional) -
recall- (requires total spam count) (Float, optional) -
coverage- hits_total / total_messages (Float, optional) -
created_at- Creation timestamp (DateTime, auto)
Индексы:
-
ix_rule_evaluations_rule_created(rule_id, created_at)
Relations:
-
rule- Many-to-one with Rule
Ingest external logs into normalized Message storage.
Методы:
-
ingest_from_tas_api()- Pull from external API (HTTP client with retry) -
ingest_from_tas_storage()- Read from files/DB (JSON/CSV) -
ingest_batch()- Idempotent batch ingestion -
ingest_from_csv()- CSV import
Особенности:
- Idempotency via
external_id - Support for multiple sources (API, storage, CSV)
- Large file processing via streaming
- Retry logic for HTTP requests
- Error Handling: httpx.*, IOError, OSError
Mining patterns from Message batches.
Методы:
-
mine_patterns()- Main entry point- Параметры:
days,min_spam_count,use_llm,use_semantic,enable_llm_validation - Возвращает:
{patterns_created, rules_created, messages_processed, spam_count, ham_count}
- Параметры:
-
_extract_and_aggregate()- Feature extraction and aggregation -
_generate_patterns_and_rules()- Creating Pattern and Rule objects -
_llm_pattern_discovery()- LLM for semantic patterns -
_process_llm_rule()- Processing LLM-suggested rules
Особенности:
- Chunked processing for large datasets (
chunk_size) - Aggregates signals before LLM calls
- Minimizes LLM usage through compact summaries
- Creates
Patternrecords and candidateRuleobjects - Support for semantic mining via embedding engine
- LLM validation via
v2_sql_llm_validator
Pattern Types:
- URL patterns
- Keyword patterns
- Signature patterns
- Semantic clusters (if enabled)
Managing lifecycle state machine for rules.
States: candidate → shadow → active → deprecated
Методы:
-
create_candidate_rule()- Creating new rule in candidate status -
move_to_shadow()- Transition candidate → shadow (for evaluation) -
promote_to_active()- Promotion shadow → active (after successful evaluation) -
deprecate_rule()- Deprecation of rule (from any status)
Evaluating rules in shadow mode on real data.
Методы:
-
evaluate_rule(rule_id, days)- Evaluating a single rule -
evaluate_all_shadow_rules(days)- Evaluating all shadow rules
Процесс:
- Executes SQL from
sql_expressionon real messages - Calculates metrics:
hits_total,spam_hits,ham_hits - Computes:
precision,coverage - Creates
RuleEvaluationrecords
Особенности:
- SQL safety validation via
v2_sql_safety - Error Handling: SQLSafetyError, SQLAlchemyError
- Minimum sample size for evaluation
Automatic promotion and rollback of rules.
Методы:
-
promote_shadow_rules()- Promotion shadow → active -
monitor_active_rules()- Monitoring and deprecation of degrading rules -
export_active_rules(backend_type)- Export to SQL/ROL formats
AggressivenessProfile:
-
conservative()- min_precision=0.95, max_coverage=0.05, max_ham_hits=5 -
balanced()- min_precision=0.90, max_coverage=0.10, max_ham_hits=10 -
aggressive()- min_precision=0.85, max_coverage=0.20, max_ham_hits=20
Процесс продвижения:
- Gets shadow rules
- Checks metrics from
RuleEvaluation - Compares with
AggressivenessProfilethresholds - Promotes if metrics match
Export rules to various formats.
Интерфейс:
-
RuleBackend- Abstract interface -
SqlRuleBackend- SQL export -
RolRuleBackend- ROL format export -
create_rule_backend(backend_type)- Factory function
LLM integration for pattern discovery.
Интерфейс:
-
PatternMiningEngine- Abstract interface -
OpenAIPatternMiningEngine- OpenAI implementation -
create_mining_engine()- Factory function
Использование:
- Only for offline pattern discovery
- Not used for real-time classification
- Optional (can be disabled)
Embedding engine for semantic mining.
Интерфейс:
-
EmbeddingEngine- Abstract interface -
OpenAIEmbeddingEngine- OpenAI implementation -
create_embedding_engine()- Factory function
Использование:
- Semantic clustering of similar messages
- Finds semantically similar patterns
- Используется в
PatternMiningPipelineеслиuse_semantic=true
Validation and sanitization of SQL rules.
Методы:
-
validate_sql_rule()- Validation of SQL rules -
sanitize_sql_for_evaluation()- Sanitization for execution
Особенности:
- Whitelist of tables/columns
- Protection against SQL injection
- Check for "match everything" rules
- Only SELECT queries allowed
LLM validation of SQL rules.
Методы:
-
validate_rule_with_llm()- LLM validation of SQL rules
Особенности:
- Check for false positives
- Risk assessment
- Optional (if LLM is available)
1. Ingestion:
TASLogIngester.ingest_batch()
→ MessageRepository.create()
→ Messages в БД
2. Pattern Mining:
PatternMiningPipeline.mine_patterns()
→ Pattern extraction (URL, keyword, signature, semantic)
→ LLM pattern discovery (optional)
→ PatternRepository.create()
→ RuleRepository.create() (status=CANDIDATE)
3. Shadow Evaluation:
ShadowEvaluationService.evaluate_rule()
→ SQL execution on real messages
→ Metric calculation
→ RuleEvaluationRepository.create()
4. Promotion:
PromotionService.promote_shadow_rules()
→ Checking metrics against AggressivenessProfile
→ RuleLifecycleService.transition() (SHADOW → ACTIVE)
→ Export via RuleBackend
Implement RuleBackend interface for custom export formats:
from app.v2_rule_backend import RuleBackend
class CustomRuleBackend(RuleBackend):
async def export_rules(self, rules: List[Rule]) -> str:
# Your implementation
passImplement PatternMiningEngine for custom LLM providers:
from app.v2_llm_engine import PatternMiningEngine
class CustomLLMEngine(PatternMiningEngine):
async def discover_patterns(self, signals: Dict, examples: List[str]) -> Dict:
# Your implementation
passImplement EmbeddingEngine for custom embedding providers:
from app.v2_embedding_engine import EmbeddingEngine
class CustomEmbeddingEngine(EmbeddingEngine):
async def embed_texts(self, texts: List[str]) -> List[List[float]]:
# Your implementation
pass- Whitelist of tables/columns
- Только SELECT queries
- Protection against SQL injection
- Validation before execution
- Privacy modes: STANDARD / STRICT
- PII redaction
- Logs do not store full texts in STRICT mode
- Idempotency via
external_id - No hardcoded secrets
- All settings via environment variables
- API Reference — API documentation
- Integration Guide — integration
- Configuration Guide — settings