AI-Powered Log Anomaly Detection System
Anomalyze is a backend-driven anomaly detection system designed to identify suspicious behavior in application logs using machine learning.
It simulates real-world security scenarios such as:
- Brute force login attacks
- API scraping bots
- Data exfiltration attempts
- Off-hours suspicious activity
The system processes logs, builds behavioral features per IP, and detects anomalies using an Isolation Forest model.
Swagger UI: https://anomalyze-5ayj.onrender.com/docs
Click Authorize 🔒 and enter your API key to test all endpoints.
Free tier — may take ~30s to wake up on first visit.
- Backend: FastAPI
- Database: SQLite (SQLAlchemy ORM)
- Machine Learning: Scikit-learn (Isolation Forest)
- Data Processing: Pandas
- Environment: Python 3.x
app/
├── api/ # API routes
├── core/ # Config, security
├── db/ # Database setup
├── models/ # DB models
├── ml/ # ML pipeline
│ ├── feature_builder.py
│ ├── trainer.py
│ ├── anomaly_detector.py
│ └── model_store.py
├── services/ # Business logic
└── main.py # Entry point
scripts/
├── seed.py # Generate synthetic logs
├── run_training.py # Train ML model
└── verify.py # Validate pipeline
- Generates realistic system activity
- Includes both normal and malicious patterns
-
Aggregates logs per IP
-
Extracts:
- Request count
- Failed login ratio
- Error rate
- Off-hours activity
- Uses Isolation Forest for anomaly detection
- Automatically flags suspicious IPs
- Saves trained models for reuse
- Prevents retraining on every run
-
Ensures:
- Data availability
- Model existence
- Inference correctness
-
Seed Data
python -m app.scripts.seed -
Train Model
python -m app.scripts.run_training -
Verify Pipeline
python -m app.scripts.verify -
Run Server
uvicorn app.main:app --reload
-
Detects anomalous IPs based on:
- High request bursts
- Suspicious login patterns
- Unusual access timing
| Attack Type | Behavior |
|---|---|
| Brute Force | Repeated login failures |
| Scraper | High API volume + rotating user agents |
| Exfiltration | Bulk data access & exports |
| Off-hours Access | Activity during unusual hours |
- Uses synthetic data (not production logs)
- Limited feature depth
- No real-time streaming (batch-based)
- Real-time log ingestion (Kafka / streaming)
- Advanced feature engineering (time-series behavior)
- Dashboard for visualization
- Model evaluation metrics (precision, recall)
- Role-based API security
Built as a learning-focused project to explore:
- Backend system design
- Machine learning integration
- Security analytics concepts
This project demonstrates:
- End-to-end ML pipeline integration
- Clean backend architecture
- Real-world problem simulation
“Not just detecting anomalies — understanding behavior.”