Skip to content

Commit 79f5e44

Browse files
committed
Configure MkDocs documentation and deployment workflow
1 parent 6d0ff4e commit 79f5e44

File tree

11 files changed

+177
-177
lines changed

11 files changed

+177
-177
lines changed

.github/workflows/docs.yml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: Documentation
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
paths:
8+
- 'docs/**'
9+
- 'mkdocs.yml'
10+
- '.github/workflows/docs.yml'
11+
# Allow manual trigger
12+
workflow_dispatch:
13+
14+
permissions:
15+
contents: write
16+
17+
jobs:
18+
deploy:
19+
runs-on: ubuntu-latest
20+
steps:
21+
- uses: actions/checkout@v4
22+
with:
23+
fetch-depth: 0
24+
25+
- name: Setup Python
26+
uses: actions/setup-python@v5
27+
with:
28+
python-version: '3.11'
29+
30+
- name: Install dependencies
31+
run: |
32+
uv pip install --system mkdocs-material mkdocstrings mkdocstrings-python mike mkdocs-git-revision-date-localized-plugin
33+
34+
- name: Configure Git user
35+
run: |
36+
git config --local user.email "github-actions[bot]@users.noreply.github.com"
37+
git config --local user.name "github-actions[bot]"
38+
39+
- name: Deploy docs
40+
run: |
41+
mkdocs gh-deploy --force

README.md

Lines changed: 65 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,22 @@
55
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
66
[![Monthly Build](https://github.com/yourusername/linux-edr/actions/workflows/test-and-publish.yml/badge.svg?event=schedule)](https://github.com/yourusername/linux-edr/actions/workflows/test-and-publish.yml)
77

8-
A lightweight Endpoint Detection and Response (EDR) tool for Linux systems.
8+
A lightweight yet comprehensive Endpoint Detection and Response (EDR) solution for Linux systems that monitors command execution, analyzes system behavior, and provides actionable security insights with minimal performance impact.
99

10-
## Features
10+
## Overview
1111

12-
- Non-blocking trace reader for `/sys/kernel/tracing/trace_pipe`
13-
- Thread-safe event aggregation with memory protection
14-
- Process-focused event collection and grouping
15-
- Scheduled summarization and reporting every 15 minutes
16-
- OpenAI integration (gpt-4o-mini) for automated threat analysis
17-
- Configurable output formats (JSON, console)
18-
- Flexible configuration via config.ini
19-
- Type-safe implementation with comprehensive error handling
20-
- Privacy-respecting design (see [Privacy Policy](PRIVACY.md))
12+
Linux EDR captures process execution data through Linux's kernel tracing capabilities and builds a multi-tiered reporting structure that allows for both real-time threat detection and long-term security trend analysis. By focusing on command execution patterns, it provides valuable security insights without the overhead of traditional EDR solutions.
13+
14+
## Key Features
15+
16+
- **Efficient Monitoring**: Non-blocking trace reader for `/sys/kernel/tracing/trace_pipe` with automatic recovery
17+
- **Scalable Architecture**: Thread-safe event buffer with configurable capacity and age limits
18+
- **Smart Data Organization**: Process-focused event collection and intelligent command grouping
19+
- **Hierarchical Reporting**: Tiered reports from 15-minute snapshots to monthly trend analysis
20+
- **AI-Enhanced Security**: OpenAI integration with gpt-4o-mini for automated threat detection
21+
- **Flexible Output**: Configurable reporting to JSON files or console
22+
- **Production-Ready**: Comprehensive error handling with graceful recovery from failures
23+
- **Privacy-Focused**: Collects only necessary command execution data (see [Privacy Policy](PRIVACY.md))
2124

2225
## Installation
2326

@@ -47,7 +50,7 @@ linux-edr show-config
4750

4851
## Data Structure
4952

50-
Linux EDR groups execve events by process name and maintains the full command:
53+
Linux EDR groups execve events by process name and maintains the full command line for context:
5154

5255
```json
5356
{
@@ -106,28 +109,28 @@ include_raw_events = true
106109

107110
## Hierarchical Reporting Architecture
108111

109-
Linux EDR uses a hierarchical reporting system to provide insights at different time scales:
112+
Linux EDR implements a sophisticated multi-tiered reporting system that provides security visibility across different time scales:
110113

111114
| Level | Coverage | Name | Description |
112115
|:-----:|:--------------------|:---------------|:-----------------------------------------------------|
113-
| 1 | 15 minutes | **Cell** | Base unit covering a 15-minute interval |
114-
| 2 | 16 Cells = 4 hours | **Block** | Aggregates 16 Cells (4 hours of activity) |
115-
| 3 | 6 Blocks = 24 hours | **DailyReport**| Consolidates 6 Blocks (full day of activity) |
116-
| 4 | 7 DailyReports | **WeeklyReport**| Analyzes 7 daily reports (week-long patterns) |
117-
| 5 | ~4 WeeklyReports | **MonthlyReport**| Long-term analysis of approximately 4 weeks |
116+
| 1 | 15 minutes | **Cell** | Base unit capturing immediate system activity |
117+
| 2 | 16 Cells = 4 hours | **Block** | Short-term patterns across multiple Cells |
118+
| 3 | 6 Blocks = 24 hours | **DailyReport**| Consolidated view of a full day's activity |
119+
| 4 | 7 DailyReports | **WeeklyReport**| Week-long trends with daily breakdowns |
120+
| 5 | ~4 WeeklyReports | **MonthlyReport**| Strategic view of monthly security posture |
118121

119-
This multi-level approach enables:
120-
- Immediate detection of suspicious activity (Cell level)
121-
- Short-term pattern recognition (Block level)
122-
- Daily security posture assessment (DailyReport)
123-
- Weekly trend analysis (WeeklyReport)
124-
- Monthly strategic security reviews (MonthlyReport)
122+
This architecture enables:
123+
- **Immediate threat detection** at the Cell level
124+
- **Context-rich pattern recognition** at the Block level
125+
- **Daily security posture assessment** in DailyReports
126+
- **Trend identification** in WeeklyReports
127+
- **Strategic security planning** with MonthlyReports
125128

126-
All reports are stored in JSON format under the configured `reports_dir` with subdirectories for each level.
129+
All reports are automatically stored in JSON format in the configured `reports_dir` with appropriate subdirectories for each level.
127130

128131
## Systemd Service
129132

130-
Linux EDR can be run as a systemd service:
133+
Linux EDR can be deployed as a systemd service for continuous monitoring:
131134

132135
1. Copy the service file to systemd directory:
133136
```bash
@@ -151,66 +154,68 @@ Linux EDR can be run as a systemd service:
151154
sudo systemctl status linux-edr.service
152155
```
153156

154-
## Automated Analysis
157+
## Automated Security Analysis
155158

156-
The tool sends process execution data to OpenAI's gpt-4o-mini model for analysis every 15 minutes (configurable). The AI looks for suspicious patterns like:
159+
Linux EDR leverages OpenAI's gpt-4o-mini model to analyze process execution patterns and identify potential security threats. The analysis focuses on:
157160

158-
- Unusual command execution patterns
161+
- Unusual command execution patterns and frequencies
159162
- Potential privilege escalation attempts
160-
- Data exfiltration attempts
161-
- Unusual network access
162-
- Suspicious file operations
163+
- Command sequences indicating data exfiltration
164+
- Anomalous network access patterns
165+
- Suspicious file operations or permission changes
163166

164-
Analysis results are saved alongside the JSON reports with the `.analysis` extension.
167+
Analysis results are saved alongside JSON reports with the `.analysis` extension, providing actionable insights without requiring manual review of raw data.
165168

166169
## Privacy and System Impact
167170

168-
Linux EDR is designed to be non-invasive and privacy-respecting:
171+
Linux EDR is designed with privacy and performance in mind:
169172

170-
- Only monitors execve syscalls, not file contents or keystrokes
171-
- Stores data locally by default
172-
- Transmits data externally only with explicit configuration
173-
- Uses minimal system resources
174-
- Gracefully handles various error conditions
175-
- See our full [Privacy Policy](PRIVACY.md)
173+
- Collects only process execution data, not file contents or user input
174+
- Stores data locally by default with configurable retention
175+
- Transmits data externally only when explicitly configured
176+
- Uses non-blocking I/O and efficient buffering to minimize CPU usage
177+
- Implements backpressure mechanisms to handle high-volume events
178+
- See the full [Privacy Policy](PRIVACY.md) for details
176179

177-
## Error Handling
180+
## Advanced Error Handling
178181

179-
Linux EDR includes comprehensive error handling to ensure reliable operation:
182+
To ensure reliable operation in production environments, Linux EDR includes:
180183

181-
- Graceful handling of missing trace_pipe (waits for it to become available)
182-
- Proper permission error reporting
183-
- Automatic reopening of trace files if they become unavailable
184-
- Configurable logging levels and rotation
185-
- Thread-safe operations with proper resource cleanup
184+
- Smart retry logic for trace pipe access with configurable backoff
185+
- Graceful handling of permission errors with clear guidance
186+
- Automatic reconnection if trace sources become unavailable
187+
- Thread-safe operations with proper resource management
188+
- Comprehensive logging with configurable verbosity
189+
- Clean shutdown mechanisms that preserve data integrity
186190

187191
## Requirements
188192

189193
- Python 3.11 or later
190-
- [uv](https://github.com/astral-sh/uv) (required for all dependency management and installation)
194+
- [uv](https://github.com/astral-sh/uv) for dependency management
191195
- Linux kernel with ftrace support
192-
- Appropriate permissions to read from trace_pipe (typically root)
196+
- Appropriate permissions to read from trace_pipe (typically requires root)
193197

194198
## Project Structure
195199

196200
```text
197201
linux-edr/
198202
├── linux_edr/
199203
│ ├── __init__.py
200-
│ ├── cli.py # Typer-based CLI entrypoint
201-
│ ├── app.py # Orchestration & lifecycle
204+
│ ├── cli.py # Typer-based CLI interface
205+
│ ├── app.py # Core application logic
202206
│ ├── config.py # Configuration management
203-
│ ├── trace.py # Non-blocking ftrace reader
204-
│ ├── aggregator.py # Event aggregation & buffering
205-
│ ├── summary.py # Summary & statistics builder
206-
│ ├── reporter.py # OpenAI + file/HTTP outputs
207+
│ ├── trace.py # Non-blocking trace reader
208+
│ ├── aggregator.py # Thread-safe event buffering
209+
│ ├── summary.py # Report generation
210+
│ ├── reporter.py # OpenAI integration and output
211+
│ ├── report_manager.py # Hierarchical report handling
207212
│ └── models.py # Pydantic data models
208-
├── tests/ # pytest unit & integration tests
209-
├── docs/ # MkDocs site
210-
├── linux-edr.service # Systemd service file
211-
├── pyproject.toml # Build metadata & entry point
213+
├── tests/ # Comprehensive test suite
214+
├── docs/ # Documentation
215+
├── linux-edr.service # Systemd service definition
216+
├── pyproject.toml # Project metadata
212217
├── PRIVACY.md # Privacy policy
213-
└── README.md # Project overview & badges
218+
└── README.md # This file
214219
```
215220

216221
## Development

linux_edr/aggregator.py

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -67,12 +67,7 @@ def add(self, event: Dict[str, Any]) -> bool:
6767
return False
6868

6969
def snapshot_and_clear(self) -> List[Dict[str, Any]]:
70-
"""
71-
Take a snapshot of the current buffer and clear it.
72-
73-
Returns:
74-
List of events in the buffer
75-
"""
70+
"""Take a snapshot of the current buffer and clear it."""
7671
with self.lock:
7772
# Remove old events if max_age is set
7873
if self.max_age_seconds is not None:
@@ -100,12 +95,7 @@ def snapshot_and_clear(self) -> List[Dict[str, Any]]:
10095
return events
10196

10297
def get_stats(self) -> Dict[str, Any]:
103-
"""
104-
Get statistics about the aggregator.
105-
106-
Returns:
107-
Dictionary with statistics
108-
"""
98+
"""Get statistics about the aggregator."""
10999
with self.lock:
110100
return {
111101
"buffer_size": len(self.buffer),

linux_edr/app.py

Lines changed: 17 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
import logging.config
2-
from collections import namedtuple, defaultdict
2+
from collections import defaultdict
33
import re
44
import os
55
from typing import Dict, List, Optional, Any, NamedTuple, Iterator
@@ -45,6 +45,11 @@ class ExecveEvent(NamedTuple):
4545
command: str
4646
args: List[str]
4747

48+
# Pre-compile the execve pattern once at import time for better performance
49+
# Example trace snippet:
50+
# "12345 [678] ... execve("/usr/bin/python3" \"python3\" \"script.py\")"
51+
EXECVE_PATTERN = re.compile(r"(\S+)\s+\[(\d+)\]\s+.*execve.*\((.*?)\)")
52+
4853
def parse_execve(line: str) -> Optional[ExecveEvent]:
4954
"""
5055
Parse execve events from ftrace output.
@@ -59,8 +64,9 @@ def parse_execve(line: str) -> Optional[ExecveEvent]:
5964
return None
6065

6166
try:
62-
execve_pattern = r'(\S+)\s+\[(\d+)\]\s+.*execve.*\((.*?)\)'
63-
match = re.search(execve_pattern, line)
67+
# Re-use the pre-compiled pattern; compiling inside the tight loop is unnecessarily
68+
# expensive when processing thousands of trace lines per second.
69+
match = EXECVE_PATTERN.search(line)
6470
if not match:
6571
return None
6672

@@ -77,12 +83,7 @@ def parse_execve(line: str) -> Optional[ExecveEvent]:
7783
if not cmd_parts:
7884
return None
7985

80-
return ExecveEvent(
81-
timestamp=timestamp,
82-
pid=pid,
83-
command=cmd_parts[0].strip('"'),
84-
args=cmd_parts[1:] if len(cmd_parts) > 1 else []
85-
)
86+
return ExecveEvent(timestamp, pid, cmd_parts[0].strip('"'), cmd_parts[1:])
8687
except Exception as e:
8788
logging.error(f"Error parsing execve event: {e}, line: {line}")
8889
return None
@@ -101,25 +102,16 @@ def process_raw_events(events: List[Dict[str, Any]]) -> Dict[str, List[str]]:
101102
return {}
102103

103104
grouped_events: Dict[str, List[str]] = defaultdict(list)
104-
105+
105106
for event in events:
106107
try:
107-
process_name = event.get("command")
108-
if not process_name:
109-
continue
110-
111-
# Build command line string
112-
cmd_line = process_name
113-
args = event.get("args", [])
114-
if args:
115-
cmd_line += " " + " ".join(str(arg) for arg in args)
116-
117-
# Add to grouped events
118-
grouped_events[process_name].append(cmd_line)
108+
if process_name := event.get("command"):
109+
# Join command and args in the most compact/pythonic way
110+
cmd_line = " ".join([process_name, *map(str, event.get("args", []))])
111+
grouped_events[process_name].append(cmd_line)
119112
except Exception as e:
120-
logging.warning(f"Error processing event {event}: {e}")
121-
122-
# Convert defaultdict to regular dict
113+
logging.warning("Error processing event %s: %s", event, e)
114+
123115
return dict(grouped_events)
124116

125117
class LinuxEDRApp:

linux_edr/cli.py

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,7 @@ def run(
1919
None, "--debug", "-d", help="Enable debug logging"
2020
),
2121
):
22-
"""
23-
Run Linux EDR monitoring.
24-
25-
The application can be configured via a config.ini file or command line arguments.
26-
Command line arguments take precedence over configuration file settings.
27-
"""
22+
"""Run Linux EDR monitoring with config.ini or command line arguments."""
2823
LinuxEDRApp(
2924
config_path=config,
3025
interval=interval,
@@ -38,12 +33,7 @@ def show_config(
3833
None, "--config", "-c", help="Path to config file"
3934
),
4035
):
41-
"""
42-
Show the current configuration values.
43-
44-
This displays the effective configuration after loading from file
45-
and applying any environment variables.
46-
"""
36+
"""Display effective configuration from file and environment variables."""
4737
from .config import Config
4838
import json
4939

0 commit comments

Comments
 (0)