ParttimeWorks
diff --git a/‎.github/workflows/docs.yml‎
Lines changed: 41 additions & 0 deletions b/‎.github/workflows/docs.yml‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 65 additions & 60 deletions b/‎README.md‎
Lines changed: 65 additions & 60 deletions
diff --git a/‎linux_edr/aggregator.py‎
Lines changed: 2 additions & 12 deletions b/‎linux_edr/aggregator.py‎
Lines changed: 2 additions & 12 deletions
diff --git a/‎linux_edr/app.py‎
Lines changed: 17 additions & 25 deletions b/‎linux_edr/app.py‎
Lines changed: 17 additions & 25 deletions
diff --git a/‎linux_edr/cli.py‎
Lines changed: 2 additions & 12 deletions b/‎linux_edr/cli.py‎
Lines changed: 2 additions & 12 deletions
@@ -0,0 +1,41 @@
+name: Documentation
+
+on:
+  push:
+    branches:
+      - master
+    paths:
+      - 'docs/**'
+      - 'mkdocs.yml'
+      - '.github/workflows/docs.yml'
+  # Allow manual trigger
+  workflow_dispatch:
+
+permissions:
+  contents: write
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      
+      - name: Setup Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+      
+      - name: Install dependencies
+        run: |
+          uv pip install --system mkdocs-material mkdocstrings mkdocstrings-python mike mkdocs-git-revision-date-localized-plugin
+      
+      - name: Configure Git user
+        run: |
+          git config --local user.email "github-actions[bot]@users.noreply.github.com"
+          git config --local user.name "github-actions[bot]"
+      
+      - name: Deploy docs
+        run: |
+          mkdocs gh-deploy --force 
@@ -5,19 +5,22 @@
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![Monthly Build](https://github.com/yourusername/linux-edr/actions/workflows/test-and-publish.yml/badge.svg?event=schedule)](https://github.com/yourusername/linux-edr/actions/workflows/test-and-publish.yml)
 
-A lightweight Endpoint Detection and Response (EDR) tool for Linux systems.
+A lightweight yet comprehensive Endpoint Detection and Response (EDR) solution for Linux systems that monitors command execution, analyzes system behavior, and provides actionable security insights with minimal performance impact.
 
-## Features
+## Overview
 
-- Non-blocking trace reader for `/sys/kernel/tracing/trace_pipe`
-- Thread-safe event aggregation with memory protection
-- Process-focused event collection and grouping
-- Scheduled summarization and reporting every 15 minutes
-- OpenAI integration (gpt-4o-mini) for automated threat analysis
-- Configurable output formats (JSON, console)
-- Flexible configuration via config.ini
-- Type-safe implementation with comprehensive error handling
-- Privacy-respecting design (see [Privacy Policy](PRIVACY.md))
+Linux EDR captures process execution data through Linux's kernel tracing capabilities and builds a multi-tiered reporting structure that allows for both real-time threat detection and long-term security trend analysis. By focusing on command execution patterns, it provides valuable security insights without the overhead of traditional EDR solutions.
+
+## Key Features
+
+- **Efficient Monitoring**: Non-blocking trace reader for `/sys/kernel/tracing/trace_pipe` with automatic recovery
+- **Scalable Architecture**: Thread-safe event buffer with configurable capacity and age limits
+- **Smart Data Organization**: Process-focused event collection and intelligent command grouping
+- **Hierarchical Reporting**: Tiered reports from 15-minute snapshots to monthly trend analysis
+- **AI-Enhanced Security**: OpenAI integration with gpt-4o-mini for automated threat detection
+- **Flexible Output**: Configurable reporting to JSON files or console
+- **Production-Ready**: Comprehensive error handling with graceful recovery from failures
+- **Privacy-Focused**: Collects only necessary command execution data (see [Privacy Policy](PRIVACY.md))
 
 ## Installation
 
@@ -47,7 +50,7 @@ linux-edr show-config
 
 ## Data Structure
 
-Linux EDR groups execve events by process name and maintains the full command:
+Linux EDR groups execve events by process name and maintains the full command line for context:
 
 ```json
 {
@@ -106,28 +109,28 @@ include_raw_events = true
 
 ## Hierarchical Reporting Architecture
 
-Linux EDR uses a hierarchical reporting system to provide insights at different time scales:
+Linux EDR implements a sophisticated multi-tiered reporting system that provides security visibility across different time scales:
 
 | Level | Coverage            | Name           | Description                                          |
 |:-----:|:--------------------|:---------------|:-----------------------------------------------------|
-| 1     | 15 minutes          | **Cell**       | Base unit covering a 15-minute interval              |
-| 2     | 16 Cells = 4 hours  | **Block**      | Aggregates 16 Cells (4 hours of activity)            |
-| 3     | 6 Blocks = 24 hours | **DailyReport**| Consolidates 6 Blocks (full day of activity)         |
-| 4     | 7 DailyReports      | **WeeklyReport**| Analyzes 7 daily reports (week-long patterns)       |
-| 5     | ~4 WeeklyReports    | **MonthlyReport**| Long-term analysis of approximately 4 weeks        |
+| 1     | 15 minutes          | **Cell**       | Base unit capturing immediate system activity        |
+| 2     | 16 Cells = 4 hours  | **Block**      | Short-term patterns across multiple Cells            |
+| 3     | 6 Blocks = 24 hours | **DailyReport**| Consolidated view of a full day's activity           |
+| 4     | 7 DailyReports      | **WeeklyReport**| Week-long trends with daily breakdowns              |
+| 5     | ~4 WeeklyReports    | **MonthlyReport**| Strategic view of monthly security posture         |
 
-This multi-level approach enables:
-- Immediate detection of suspicious activity (Cell level)
-- Short-term pattern recognition (Block level)
-- Daily security posture assessment (DailyReport)
-- Weekly trend analysis (WeeklyReport)
-- Monthly strategic security reviews (MonthlyReport)
+This architecture enables:
+- **Immediate threat detection** at the Cell level
+- **Context-rich pattern recognition** at the Block level
+- **Daily security posture assessment** in DailyReports
+- **Trend identification** in WeeklyReports
+- **Strategic security planning** with MonthlyReports
 
-All reports are stored in JSON format under the configured `reports_dir` with subdirectories for each level.
+All reports are automatically stored in JSON format in the configured `reports_dir` with appropriate subdirectories for each level.
 
 ## Systemd Service
 
-Linux EDR can be run as a systemd service:
+Linux EDR can be deployed as a systemd service for continuous monitoring:
 
 1. Copy the service file to systemd directory:
    ```bash
@@ -151,66 +154,68 @@ Linux EDR can be run as a systemd service:
    sudo systemctl status linux-edr.service
    ```
 
-## Automated Analysis
+## Automated Security Analysis
 
-The tool sends process execution data to OpenAI's gpt-4o-mini model for analysis every 15 minutes (configurable). The AI looks for suspicious patterns like:
+Linux EDR leverages OpenAI's gpt-4o-mini model to analyze process execution patterns and identify potential security threats. The analysis focuses on:
 
-- Unusual command execution patterns
+- Unusual command execution patterns and frequencies
 - Potential privilege escalation attempts
-- Data exfiltration attempts
-- Unusual network access
-- Suspicious file operations
+- Command sequences indicating data exfiltration
+- Anomalous network access patterns
+- Suspicious file operations or permission changes
 
-Analysis results are saved alongside the JSON reports with the `.analysis` extension.
+Analysis results are saved alongside JSON reports with the `.analysis` extension, providing actionable insights without requiring manual review of raw data.
 
 ## Privacy and System Impact
 
-Linux EDR is designed to be non-invasive and privacy-respecting:
+Linux EDR is designed with privacy and performance in mind:
 
-- Only monitors execve syscalls, not file contents or keystrokes
-- Stores data locally by default
-- Transmits data externally only with explicit configuration
-- Uses minimal system resources
-- Gracefully handles various error conditions
-- See our full [Privacy Policy](PRIVACY.md)
+- Collects only process execution data, not file contents or user input
+- Stores data locally by default with configurable retention
+- Transmits data externally only when explicitly configured
+- Uses non-blocking I/O and efficient buffering to minimize CPU usage
+- Implements backpressure mechanisms to handle high-volume events
+- See the full [Privacy Policy](PRIVACY.md) for details
 
-## Error Handling
+## Advanced Error Handling
 
-Linux EDR includes comprehensive error handling to ensure reliable operation:
+To ensure reliable operation in production environments, Linux EDR includes:
 
-- Graceful handling of missing trace_pipe (waits for it to become available)
-- Proper permission error reporting
-- Automatic reopening of trace files if they become unavailable
-- Configurable logging levels and rotation
-- Thread-safe operations with proper resource cleanup
+- Smart retry logic for trace pipe access with configurable backoff
+- Graceful handling of permission errors with clear guidance
+- Automatic reconnection if trace sources become unavailable
+- Thread-safe operations with proper resource management
+- Comprehensive logging with configurable verbosity
+- Clean shutdown mechanisms that preserve data integrity
 
 ## Requirements
 
 - Python 3.11 or later
-- [uv](https://github.com/astral-sh/uv) (required for all dependency management and installation)
+- [uv](https://github.com/astral-sh/uv) for dependency management
 - Linux kernel with ftrace support
-- Appropriate permissions to read from trace_pipe (typically root)
+- Appropriate permissions to read from trace_pipe (typically requires root)
 
 ## Project Structure
 
 ```text
 linux-edr/
 ├── linux_edr/
 │   ├── __init__.py
-│   ├── cli.py            # Typer-based CLI entrypoint
-│   ├── app.py            # Orchestration & lifecycle
+│   ├── cli.py            # Typer-based CLI interface
+│   ├── app.py            # Core application logic
 │   ├── config.py         # Configuration management
-│   ├── trace.py          # Non-blocking ftrace reader
-│   ├── aggregator.py     # Event aggregation & buffering
-│   ├── summary.py        # Summary & statistics builder
-│   ├── reporter.py       # OpenAI + file/HTTP outputs
+│   ├── trace.py          # Non-blocking trace reader
+│   ├── aggregator.py     # Thread-safe event buffering
+│   ├── summary.py        # Report generation
+│   ├── reporter.py       # OpenAI integration and output
+│   ├── report_manager.py # Hierarchical report handling
 │   └── models.py         # Pydantic data models
-├── tests/                # pytest unit & integration tests
-├── docs/                 # MkDocs site
-├── linux-edr.service     # Systemd service file
-├── pyproject.toml        # Build metadata & entry point
+├── tests/                # Comprehensive test suite
+├── docs/                 # Documentation
+├── linux-edr.service     # Systemd service definition
+├── pyproject.toml        # Project metadata
 ├── PRIVACY.md            # Privacy policy
-└── README.md             # Project overview & badges
+└── README.md             # This file
 ```
 
 ## Development
 
@@ -67,12 +67,7 @@ def add(self, event: Dict[str, Any]) -> bool:
             return False
 
     def snapshot_and_clear(self) -> List[Dict[str, Any]]:
-        """
-        Take a snapshot of the current buffer and clear it.
-        
-        Returns:
-            List of events in the buffer
-        """
+        """Take a snapshot of the current buffer and clear it."""
         with self.lock:
             # Remove old events if max_age is set
             if self.max_age_seconds is not None:
@@ -100,12 +95,7 @@ def snapshot_and_clear(self) -> List[Dict[str, Any]]:
         return events
 
     def get_stats(self) -> Dict[str, Any]:
-        """
-        Get statistics about the aggregator.
-        
-        Returns:
-            Dictionary with statistics
-        """
+        """Get statistics about the aggregator."""
         with self.lock:
             return {
                 "buffer_size": len(self.buffer),
 
@@ -1,5 +1,5 @@
 import logging.config
-from collections import namedtuple, defaultdict
+from collections import defaultdict
 import re
 import os
 from typing import Dict, List, Optional, Any, NamedTuple, Iterator
@@ -45,6 +45,11 @@ class ExecveEvent(NamedTuple):
     command: str
     args: List[str]
 
+# Pre-compile the execve pattern once at import time for better performance
+# Example trace snippet:
+#   "12345 [678] ... execve("/usr/bin/python3" \"python3\" \"script.py\")"
+EXECVE_PATTERN = re.compile(r"(\S+)\s+\[(\d+)\]\s+.*execve.*\((.*?)\)")
+
 def parse_execve(line: str) -> Optional[ExecveEvent]:
     """
     Parse execve events from ftrace output.
@@ -59,8 +64,9 @@ def parse_execve(line: str) -> Optional[ExecveEvent]:
         return None
 
     try:
-        execve_pattern = r'(\S+)\s+\[(\d+)\]\s+.*execve.*\((.*?)\)'
-        match = re.search(execve_pattern, line)
+        # Re-use the pre-compiled pattern; compiling inside the tight loop is unnecessarily
+        # expensive when processing thousands of trace lines per second.
+        match = EXECVE_PATTERN.search(line)
         if not match:
             return None
 
@@ -77,12 +83,7 @@ def parse_execve(line: str) -> Optional[ExecveEvent]:
         if not cmd_parts:
             return None
 
-        return ExecveEvent(
-            timestamp=timestamp,
-            pid=pid,
-            command=cmd_parts[0].strip('"'),
-            args=cmd_parts[1:] if len(cmd_parts) > 1 else []
-        )
+        return ExecveEvent(timestamp, pid, cmd_parts[0].strip('"'), cmd_parts[1:])
     except Exception as e:
         logging.error(f"Error parsing execve event: {e}, line: {line}")
         return None
@@ -101,25 +102,16 @@ def process_raw_events(events: List[Dict[str, Any]]) -> Dict[str, List[str]]:
         return {}
 
     grouped_events: Dict[str, List[str]] = defaultdict(list)
-    
+
     for event in events:
         try:
-            process_name = event.get("command")
-            if not process_name:
-                continue
-                
-            # Build command line string
-            cmd_line = process_name
-            args = event.get("args", [])
-            if args:
-                cmd_line += " " + " ".join(str(arg) for arg in args)
-                
-            # Add to grouped events
-            grouped_events[process_name].append(cmd_line)
+            if process_name := event.get("command"):
+                # Join command and args in the most compact/pythonic way
+                cmd_line = " ".join([process_name, *map(str, event.get("args", []))])
+                grouped_events[process_name].append(cmd_line)
         except Exception as e:
-            logging.warning(f"Error processing event {event}: {e}")
-    
-    # Convert defaultdict to regular dict
+            logging.warning("Error processing event %s: %s", event, e)
+
     return dict(grouped_events)
 
 class LinuxEDRApp:
 
@@ -19,12 +19,7 @@ def run(
         None, "--debug", "-d", help="Enable debug logging"
     ),
 ):
-    """
-    Run Linux EDR monitoring.
-    
-    The application can be configured via a config.ini file or command line arguments.
-    Command line arguments take precedence over configuration file settings.
-    """
+    """Run Linux EDR monitoring with config.ini or command line arguments."""
     LinuxEDRApp(
         config_path=config,
         interval=interval,
@@ -38,12 +33,7 @@ def show_config(
         None, "--config", "-c", help="Path to config file"
     ),
 ):
-    """
-    Show the current configuration values.
-    
-    This displays the effective configuration after loading from file
-    and applying any environment variables.
-    """
+    """Display effective configuration from file and environment variables."""
     from .config import Config
     import json