Skip to content

OthmaneAbder2303/log_classification_system

Repository files navigation

LogPulse — Hybrid Intelligence Log Classification System

Python 3.11+ FastAPI BERT Groq Docker

LogPulse is a high-performance, tiered classification engine designed to categorize logs with maximum precision. It bridges the gap between fast deterministic rules and advanced AI reasoning.


🚀 The Hybrid Escalation Pipeline

Standard classifiers often fail when logs become verbose or unpredictable. LogPulse uses a 3-Tier Escalation strategy to ensure no log goes unclassified:

  1. ⚡ Tier 1: Regex Engine (Deterministic)
    • Scans for high-frequency, known patterns.
    • Best for: known error codes and system heartbeat logs.
  2. 🧠 Tier 2: BERT Semantic Layer (Deep Learning)
    • Uses sentence-transformers to find semantic similarity even when wording differs.
    • Best for: contextual errors like "Database unreachable" vs "Connection to DB timed out."
  3. 🤖 Tier 3: LLM Logic (Generative AI)
    • Calls the Groq API for complex, unstructured, or rare log events.
    • Best for: root cause analysis and edge cases that remain unclassified.

🛠️ Tech Stack

  • Backend: FastAPI
  • ML/NLP: BERT (sentence-transformers), scikit-learn
  • LLM Provider: Groq LPU™
  • DevOps: Docker, python-dotenv

Note: the Groq LLM model has been updated — replaced deepseek-r1-distill-llama-70b (deprecated) with llama-3.3-70b-versatile for improved performance and long-term support.


📦 Installation & Setup

1. Configure Secrets

Create a .env file in the project root (do not commit this file; a .env.example template is provided):

GROQ_API_KEY=your_groq_api_key_here

2. Docker Deployment (Recommended)

Build and run the Docker image (example):

# Build the image
docker build -t log-classifier .

# Run the container, exposing port 8000 and loading environment variables from .env
docker run -p 8000:8000 --env-file .env logpulse-app

The container expects the application to listen on 0.0.0.0:8000 inside the container.

3. Local Development

# Clone and install
git clone https://github.com/OthmaneAbder2303/log_classification_system.git
cd log_classification_system
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run the server locally
python server.py

📊 API Usage

Endpoint: POST /classify

Example request body:

{
  "source": "ModernHR",
  "log_message": "Case escalation for ticket ID 7324 failed because agent is inactive."
}

Example response:

{
  "source": "ModernHR",
  "message": "Case escalation for ticket ID 7324 failed...",
  "target_label": "Workflow Error",
  "classification_method": "LLM_Groq"
}

🛡️ Security & Git

  • Keep secrets in .env and never commit them. Use the provided .env.example as a template.
  • Add a .dockerignore to keep images small (.git, venv, __pycache__, *.ipynb, etc.).
  • The repository is configured to exclude .env, .idea/, and __pycache__/ in .gitignore.

🛣️ Future Roadmap

  • Streaming Support: Integration with Apache Kafka for real-time log ingestion.
  • Dashboard: A React/Tailwind frontend for error trend visualization.
  • Auto-Tuning: Use LLM outputs to automatically generate and suggest new Regex rules.
  • More LLM Providers: Add support for OpenAI, Hugging Face Inference API, etc.

✉️ Contact

Othmane Abderrazik

About

LogPulse: A hybrid log classification engine leveraging Regex, BERT embeddings, and Groq-powered LLMs. Features a source-aware escalation pipeline for high-precision log categorization and semantic analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors