Skip to content

giovi321/log-triage

Repository files navigation

Python License

log-triage

log-triage is a Python tool that sits between your log collector (for example Fluent Bit) and an LLM. It filters noisy logs, detects problems, and gives you a dashboard and API-ready payloads so you can triage faster.

In addition to raw LLM prompting, log-triage can run in RAG (Retrieval-Augmented Generation) mode: it indexes your documentation repositories and automatically retrieves relevant snippets to ground AI responses with context and citations.

Overview

Key concepts

  • Pipelines: Reusable recipes that define how to group log lines, which regexes to ignore or count, and which prompt template to use for LLM payloads.
  • Modules: Runtime bindings that attach a pipeline to a file path and decide whether to scan once (batch) or tail continuously (follow) with rotation awareness.
  • Findings: Structured outputs for each grouped chunk, including severity (WARNING/ERROR/CRITICAL), counts, and optional LLM payloads.
  • Addressed & false positives: Workflow flags in the dashboard; marking a false positive also writes an ignore regex back to the pipeline to prevent repeats.

How it works

log-triage watches your logs, passes them through a configured pipeline, and surfaces only the important pieces:

  1. Collect: Point a module at a log file (or directory) to read entries once or continuously with rotation handling.
  2. Group: Apply the pipeline's grouping strategy (whole-file or marker-based) to carve the stream into logical chunks.
  3. Classify: Count warnings and errors with regex rules, ignore known-noise patterns, and assign a severity.
  4. Enrich: Generate an LLM payload per finding using your prompt template and context lines.
  5. Ground with RAG (optional): Retrieve relevant documentation snippets from your configured knowledge sources and append them to the prompt to improve accuracy and add citations.
  6. Deliver: Print findings, send alerts (webhook/MQTT), store them for the Web UI, and use the dashboard to reclassify, mark false positives, or update severity.

Getting started

  1. Install the package:

    python3 -m venv .venv
    source .venv/bin/activate
    pip install --upgrade pip
    # Interactive installer — prompts for extras and CPU vs GPU PyTorch
    python scripts/install.py

    The installer asks which optional extras you want (webui, alerts, rag) and, if RAG is selected, whether to use a CPU-only (~200 MB) or GPU/CUDA (~2+ GB) build of PyTorch. Choose CPU unless you are running GPU inference on this machine.

    Manual install (skip the prompt)
    # CPU-only RAG build (recommended for most machines)
    pip install ".[webui,alerts,rag]" --extra-index-url https://download.pytorch.org/whl/cpu
    
    # GPU/CUDA RAG build
    pip install ".[webui,alerts,rag]"
  2. Configure: Copy config.yaml and edit pipelines/modules to point at your log files.

  3. Run a module:

    logtriage --config ./config.yaml run --module <module-name>
  4. Open the dashboard (optional):

    export LOGTRIAGE_CONFIG=./config.yaml
    logtriage-webui

    Visit http://127.0.0.1:8090 to review findings, adjust severity, or mark false positives.

  5. Start RAG service (optional, for improved performance):

    logtriage-rag --config ./config.yaml

    The RAG service runs on port 8091 and provides documentation retrieval capabilities. When running, WebUI and CLI will automatically use it for better performance.

Documentation

See here the full documentation

Security note: The Web UI is not designed to be exposed to the public internet due to missing CSRF protections, weak session cookies, and other controls. Run it only on trusted networks and see the documentation for the full disclaimer.

Features

  • YAML configuration for both pipelines and modules
  • Multiple pipelines, selected by name or filename regex
  • Grouping strategies (each in its own module):
    • whole-file grouping
    • marker-based grouping (for example per rsnapshot run)
  • Classifiers (each in its own module):
    • generic regex counter
    • rsnapshot-specific heuristic
  • Per-pipeline ignore rules (ignore_regexes) to drop known-noise lines before counting
  • Severity levels:
    • WARNING, ERROR, CRITICAL
  • Batch mode (scan file or directory once)
  • Follow mode (continuous tail of a single log file), rotation-aware (tail -F style)
  • Optional config change detection for follow-mode modules to auto-reload after saving via the Web UI (--reload-on-change)
  • Optional LLM payload generation with conservative gating and per-pipeline prompt templates
  • Multiple LLM provider support:
    • OpenAI and any OpenAI-compatible API (local vLLM, Ollama, Azure OpenAI, etc.)
    • Anthropic Claude (native API: claude-3-5-sonnet, claude-3-opus, etc.)
    • Provider auto-detection: pointing api_base at api.anthropic.com selects the Anthropic backend automatically
  • Per-module options for:
    • context lines included ahead of each finding (llm.context_prefix_lines)
    • alert hooks (alerts.mqtt, alerts.webhook)
  • Optional SQL database integration for storing per-finding records (SQLite or Postgres)
  • Web UI (FastAPI) to:
    • log in with username/password (bcrypt)
    • view modules and per-module stats (last severity, 24h error/warning counts, etc.)
    • inspect and edit config.yaml (atomic writes, with backup)
    • experiment with regexes (regex lab) and save them to classifiers
    • run on a dark-mode layout

License

This project is licensed under the GNU GPL v3.0 license. See LICENSE for details.

About

Modular python program to triage logs severity and prepare them for a LLM analysis

Resources

License

Security policy

Stars

Watchers

Forks

Contributors