ARCHITECTURE_FULL

PATAS Architecture - Complete Documentation

Version: 2.0.0
Based on: Real PATAS Core v2 code

Overview

PATAS Core is a generic engine for:

Analyzing large message corpora (spam / not_spam)
Automatically discovering spam patterns
Generating machine-readable blocking rules
Evaluating rules on real traffic
Promoting good rules and deactivating bad ones

Key Principle: PATAS Core is generic and reusable. Works with abstract domain models and can be wrapped by different profiles (Telegram on-prem, public API, etc.) without modification.

Data Models

Message

Normalized message storage from external logs or CSV imports.

Поля:

id - Internal ID (Integer, primary key)
external_id - External message ID (String, unique, for idempotency)
timestamp - Message timestamp (DateTime, timezone-aware, indexed)
text - Message text content (Text, required)
meta - JSON metadata (JSON, optional): channel, language, country, sender, source, etc.
is_spam - Optional spam label (Boolean, optional, indexed)
tas_action - TAS action (String, optional, indexed): 'blocked' / 'allowed'
user_complaint - User-reported spam (Boolean, default: False, indexed)
unbanned - Whether message/user was unbanned (Boolean, default: False)
created_at - Creation timestamp (DateTime, auto)

Индексы:

ix_messages_timestamp_spam (timestamp, is_spam)
ix_messages_tas_action (tas_action)

Pattern

Discovered spam patterns.

Поля:

id - Pattern ID (Integer, primary key)
type - Pattern type (PatternType enum, required, indexed):
- URL - URL patterns
- PHONE - Phone number patterns
- TEXT - Text patterns
- META - Metadata patterns
- SIGNATURE - Message signature patterns
- KEYWORD - Keyword patterns
description - Human-readable description (Text, required)
examples - Representative message texts (JSON array, optional)
created_at - Creation timestamp (DateTime, auto)
updated_at - Update timestamp (DateTime, auto)

Relations:

rules - One-to-many with Rule

Rule

SQL blocking rules with lifecycle management.

Поля:

id - Rule ID (Integer, primary key)
pattern_id - Associated pattern (Integer, FK, optional, indexed)
sql_expression - Safe SELECT query (Text, required)
status - Lifecycle state (RuleStatus enum, required, indexed):
- CANDIDATE - New rule created by pattern mining
- SHADOW - Rule in shadow evaluation
- ACTIVE - Active rule, ready for export
- DEPRECATED - Deprecated rule
origin - Origin (String, required, default: 'llm'): 'llm', 'pattern_mining', 'manual'
created_at - Creation timestamp (DateTime, auto)
updated_at - Update timestamp (DateTime, auto)

Индексы:

ix_rules_status_updated (status, updated_at)

Relations:

pattern - Many-to-one with Pattern
evaluations - One-to-many with RuleEvaluation

RuleEvaluation

Rule evaluation metrics.

Поля:

id - Evaluation ID (Integer, primary key)
rule_id - Associated rule (Integer, FK, required, indexed)
time_period_start - Evaluation window start (DateTime, required)
time_period_end - Evaluation window end (DateTime, required)
hits_total - Total messages matched (Integer, required)
spam_hits - Spam messages matched (Integer, required)
ham_hits - Non-spam messages matched (Integer, required)
precision - spam_hits / hits_total (Float, optional)
recall - (requires total spam count) (Float, optional)
coverage - hits_total / total_messages (Float, optional)
created_at - Creation timestamp (DateTime, auto)

Индексы:

ix_rule_evaluations_rule_created (rule_id, created_at)

Relations:

rule - Many-to-one with Rule

Core Services

1. TASLogIngester (`app/v2_ingestion.py`)

Ingest external logs into normalized Message storage.

Методы:

ingest_from_tas_api() - Pull from external API (HTTP client with retry)
ingest_from_tas_storage() - Read from files/DB (JSON/CSV)
ingest_batch() - Idempotent batch ingestion
ingest_from_csv() - CSV import

Особенности:

Idempotency via external_id
Support for multiple sources (API, storage, CSV)
Large file processing via streaming
Retry logic for HTTP requests
Error Handling: httpx.*, IOError, OSError

2. PatternMiningPipeline (`app/v2_pattern_mining.py`)

Mining patterns from Message batches.

Методы:

mine_patterns() - Main entry point
- Параметры: days, min_spam_count, use_llm, use_semantic, enable_llm_validation
- Возвращает: {patterns_created, rules_created, messages_processed, spam_count, ham_count}
_extract_and_aggregate() - Feature extraction and aggregation
_generate_patterns_and_rules() - Creating Pattern and Rule objects
_llm_pattern_discovery() - LLM for semantic patterns
_process_llm_rule() - Processing LLM-suggested rules

Особенности:

Chunked processing for large datasets (chunk_size)
Aggregates signals before LLM calls
Minimizes LLM usage through compact summaries
Creates Pattern records and candidate Rule objects
Support for semantic mining via embedding engine
LLM validation via v2_sql_llm_validator

Pattern Types:

URL patterns
Keyword patterns
Signature patterns
Semantic clusters (if enabled)

3. RuleLifecycleService (`app/v2_rule_lifecycle.py`)

Managing lifecycle state machine for rules.

States: candidate → shadow → active → deprecated

Методы:

create_candidate_rule() - Creating new rule in candidate status
move_to_shadow() - Transition candidate → shadow (for evaluation)
promote_to_active() - Promotion shadow → active (after successful evaluation)
deprecate_rule() - Deprecation of rule (from any status)

4. ShadowEvaluationService (`app/v2_shadow_evaluation.py`)

Evaluating rules in shadow mode on real data.

Методы:

evaluate_rule(rule_id, days) - Evaluating a single rule
evaluate_all_shadow_rules(days) - Evaluating all shadow rules

Процесс:

Executes SQL from sql_expression on real messages
Calculates metrics: hits_total, spam_hits, ham_hits
Computes: precision, coverage
Creates RuleEvaluation records

Особенности:

SQL safety validation via v2_sql_safety
Error Handling: SQLSafetyError, SQLAlchemyError
Minimum sample size for evaluation

5. PromotionService (`app/v2_promotion.py`)

Automatic promotion and rollback of rules.

Методы:

promote_shadow_rules() - Promotion shadow → active
monitor_active_rules() - Monitoring and deprecation of degrading rules
export_active_rules(backend_type) - Export to SQL/ROL formats

AggressivenessProfile:

conservative() - min_precision=0.95, max_coverage=0.05, max_ham_hits=5
balanced() - min_precision=0.90, max_coverage=0.10, max_ham_hits=10
aggressive() - min_precision=0.85, max_coverage=0.20, max_ham_hits=20

Процесс продвижения:

Gets shadow rules
Checks metrics from RuleEvaluation
Compares with AggressivenessProfile thresholds
Promotes if metrics match

6. RuleBackend (`app/v2_rule_backend.py`)

Export rules to various formats.

Интерфейс:

RuleBackend - Abstract interface
SqlRuleBackend - SQL export
RolRuleBackend - ROL format export
create_rule_backend(backend_type) - Factory function

7. LLM Engine (`app/v2_llm_engine.py`)

LLM integration for pattern discovery.

Интерфейс:

PatternMiningEngine - Abstract interface
OpenAIPatternMiningEngine - OpenAI implementation
create_mining_engine() - Factory function

Использование:

Only for offline pattern discovery
Not used for real-time classification
Optional (can be disabled)

8. Embedding Engine (`app/v2_embedding_engine.py`)

Embedding engine for semantic mining.

Интерфейс:

EmbeddingEngine - Abstract interface
OpenAIEmbeddingEngine - OpenAI implementation
create_embedding_engine() - Factory function

Использование:

Semantic clustering of similar messages
Finds semantically similar patterns
Используется в PatternMiningPipeline если use_semantic=true

9. SQL Safety (`app/v2_sql_safety.py`)

Validation and sanitization of SQL rules.

Методы:

validate_sql_rule() - Validation of SQL rules
sanitize_sql_for_evaluation() - Sanitization for execution

Особенности:

Whitelist of tables/columns
Protection against SQL injection
Check for "match everything" rules
Only SELECT queries allowed

10. SQL LLM Validator (`app/v2_sql_llm_validator.py`)

LLM validation of SQL rules.

Методы:

validate_rule_with_llm() - LLM validation of SQL rules

Особенности:

Check for false positives
Risk assessment
Optional (if LLM is available)

Data Flow

Typical Workflow

1. Ingestion:
   TASLogIngester.ingest_batch() 
   → MessageRepository.create()
   → Messages в БД

2. Pattern Mining:
   PatternMiningPipeline.mine_patterns()
   → Pattern extraction (URL, keyword, signature, semantic)
   → LLM pattern discovery (optional)
   → PatternRepository.create()
   → RuleRepository.create() (status=CANDIDATE)

3. Shadow Evaluation:
   ShadowEvaluationService.evaluate_rule()
   → SQL execution on real messages
   → Metric calculation
   → RuleEvaluationRepository.create()

4. Promotion:
   PromotionService.promote_shadow_rules()
   → Checking metrics against AggressivenessProfile
   → RuleLifecycleService.transition() (SHADOW → ACTIVE)
   → Export via RuleBackend

Extension Points

1. Rule Backend

Implement RuleBackend interface for custom export formats:

from app.v2_rule_backend import RuleBackend

class CustomRuleBackend(RuleBackend):
    async def export_rules(self, rules: List[Rule]) -> str:
        # Your implementation
        pass

2. LLM Engine

Implement PatternMiningEngine for custom LLM providers:

from app.v2_llm_engine import PatternMiningEngine

class CustomLLMEngine(PatternMiningEngine):
    async def discover_patterns(self, signals: Dict, examples: List[str]) -> Dict:
        # Your implementation
        pass

3. Embedding Engine

Implement EmbeddingEngine for custom embedding providers:

from app.v2_embedding_engine import EmbeddingEngine

class CustomEmbeddingEngine(EmbeddingEngine):
    async def embed_texts(self, texts: List[str]) -> List[List[float]]:
        # Your implementation
        pass

Security

SQL Safety

Whitelist of tables/columns
Только SELECT queries
Protection against SQL injection
Validation before execution

Privacy

Privacy modes: STANDARD / STRICT
PII redaction
Logs do not store full texts in STRICT mode

Data Access

Idempotency via external_id
No hardcoded secrets
All settings via environment variables

Дополнительные ресурсы

API Reference — API documentation
Integration Guide — integration
Configuration Guide — settings

ARCHITECTURE_FULL

PATAS Architecture - Complete Documentation

Overview

Data Models

Message

Pattern

Rule

RuleEvaluation

Core Services

1. TASLogIngester (app/v2_ingestion.py)

2. PatternMiningPipeline (app/v2_pattern_mining.py)

3. RuleLifecycleService (app/v2_rule_lifecycle.py)

4. ShadowEvaluationService (app/v2_shadow_evaluation.py)

5. PromotionService (app/v2_promotion.py)

6. RuleBackend (app/v2_rule_backend.py)

7. LLM Engine (app/v2_llm_engine.py)

8. Embedding Engine (app/v2_embedding_engine.py)

9. SQL Safety (app/v2_sql_safety.py)

10. SQL LLM Validator (app/v2_sql_llm_validator.py)

Data Flow

Typical Workflow

Extension Points

1. Rule Backend

2. LLM Engine

3. Embedding Engine

Security

SQL Safety

Privacy

Data Access

Дополнительные ресурсы

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

1. TASLogIngester (`app/v2_ingestion.py`)

2. PatternMiningPipeline (`app/v2_pattern_mining.py`)

3. RuleLifecycleService (`app/v2_rule_lifecycle.py`)

4. ShadowEvaluationService (`app/v2_shadow_evaluation.py`)

5. PromotionService (`app/v2_promotion.py`)

6. RuleBackend (`app/v2_rule_backend.py`)

7. LLM Engine (`app/v2_llm_engine.py`)

8. Embedding Engine (`app/v2_embedding_engine.py`)

9. SQL Safety (`app/v2_sql_safety.py`)

10. SQL LLM Validator (`app/v2_sql_llm_validator.py`)