Sentinel Pipeline

A security gate tool for Python projects that integrates multiple security scanners into a unified CI/CD pipeline with policy-based enforcement and baseline management.

Features

Multi-Tool Integration: Run Bandit, pip-audit, and Semgrep from a single command
Policy-Based Enforcement: Configure severity thresholds for pass/warn/fail
Baseline Management: Suppress known low/medium findings while enforcing new issues
Structured Output: JSON reports for automation and auditing
Configuration File Support: Define project security policies in .sentinel.toml
Flexible Exclusions: Customize which files/directories to skip
Tool Validation: Check if required tools are installed before running
Detailed Logging: Configurable log levels for debugging
Zero Dependencies: Core runs on Python stdlib (tools installed separately)
ML-Based False Positive Reduction: Intelligent scoring to predict which findings are likely false positives

ML-Based False Positive Reduction

Sentinel can use machine learning to predict which findings are likely false positives, helping you focus on real security issues.

Quick Start with ML

# Use heuristic scoring (no model or dependencies needed)
sentinel --ml-enabled

# Use a trained model
sentinel --ml-enabled --ml-model-path models/my-model.json

How It Works

The ML scorer analyzes each finding using 25+ features:

File Path Signals:

Test files (tests/, *_test.py)
Scripts and tools (scripts/, bin/)
Migrations (migrations/, alembic/)
Example/demo code
Vendor/third-party code

Code Pattern Detection:

User input sources (request., sys.argv)
Shell execution (shell=True, os.system)
Dangerous functions (eval, exec)
Hardcoded secrets (password =, api_key =)
SQL queries with string formatting
File operations, network calls, crypto usage

Report Output:

{
  "ml_score": 0.234,
  "ml_label": "likely_fp",
  "ml_confidence": 0.532,
  "ml_reason": [
    {"feature": "is_test_file", "contribution": -0.3},
    {"feature": "severity_high_critical", "contribution": 0.2}
  ],
  "model_type": "heuristic"
}

Training Custom Models

You can train models on your own labeled data:

Label findings as true/false positives in my_training_data.json:

{
  "findings": [
    {
      "finding": {...},
      "label": false,
      "code_snippet": "assert user.is_authenticated"
    }
  ]
}

Train the model (requires scikit-learn):

pip install ".[ml]"
python examples/train_ml_model.py

Use your model:

sentinel --ml-enabled --ml-model-path models/sentinel-ml-model.json

Configuration

Add to .sentinel.toml:

[ml]
enabled = true
model_path = "models/sentinel-ml-model.json"  # Optional

Benefits

Explainable: See top contributing features for each score
Lightweight: Heuristic mode works without dependencies
Trainable: Learn from your team's historical data
Optional: ML is off by default, fully opt-in

See docs/ml_scoring.md for details.

Installation

Install Sentinel

pip install -e .

Install External Tools

Sentinel requires external security tools to be installed:

# Install security scanners
pip install bandit pip-audit semgrep

# Or install specific versions
pip install bandit==1.7.5 pip-audit==2.6.1 semgrep==1.45.0

Quick Start

Basic Usage

Scan the current directory:

sentinel

Scan a specific directory:

sentinel /path/to/repo

Select specific tools:

sentinel --tools bandit,pip-audit

Baseline Workflow

Create a baseline from current findings:

sentinel --write-baseline sentinel-baseline.json

Run scan with baseline filtering:

sentinel --baseline sentinel-baseline.json

Only new findings (or existing high/critical) will cause failures.

Configuration

Configuration File

Create .sentinel.toml in your project root:

[policy]
fail_on = ["high", "critical"]
warn_on = ["medium"]

tools = ["bandit", "pip-audit", "semgrep"]

exclusions = ["vendor", "third_party"]

baseline_path = "sentinel-baseline.json"
report_path = "sentinel-report.json"

log_level = "INFO"

See .sentinel.toml.example for a complete example.

Command-Line Options

sentinel [path] [options]

Positional Arguments:
  path                  Path to repository (default: current directory)

Options:
  --baseline PATH       Path to baseline JSON file for filtering
  --write-baseline PATH Create baseline file and exit
  --out PATH           Output report path (default: sentinel-report.json)
  --tools TOOLS        Comma-separated tools to run (default: all)
  --config PATH        Path to config file (default: .sentinel.toml)
  --exclude PATTERN    Exclusion pattern (can be repeated)
  --log-level LEVEL    Logging level (DEBUG, INFO, WARNING, ERROR)
  --skip-tool-check    Skip checking if external tools are installed

Exit Codes

Sentinel uses exit codes to integrate with CI/CD:

0: Passed (no findings above threshold)
1: Passed with warnings (medium findings only)
2: Failed (high or critical findings present)

Configure thresholds in .sentinel.toml:

[policy]
fail_on = ["critical"]        # Only critical findings fail
warn_on = ["high", "medium"]  # High and medium trigger warnings

Reports

JSON Report Structure

{
  "repo_path": "/path/to/repo",
  "generated_at": "2024-01-15T10:30:00Z",
  "findings": [
    {
      "tool": "bandit",
      "severity": "high",
      "title": "B602",
      "path": "app/shell.py",
      "line": 42,
      "message": "subprocess call with shell=True",
      "metadata": {
        "test_id": "B602",
        "confidence": "high"
      },
      "fingerprint": "a1b2c3d4e5f6g7h8"
    }
  ],
  "counts": {
    "low": 5,
    "medium": 2,
    "high": 1,
    "critical": 0
  }
}

Console Output

Sentinel summary:
critical:        0
    high:        1
  medium:        2
     low:        5

Report: sentinel-report.json

Top high findings:
- B602 | app/shell.py:42 | a1b2c3d4e5f6g7h8

Baseline Management

How Baselines Work

Baselines allow you to acknowledge existing technical debt while preventing new issues:

Create Baseline: Captures fingerprints of current low/medium findings
Filter on Scan: Suppresses baselined low/medium findings
Never Suppress High/Critical: High and critical findings always surface

Creating a Baseline

# Scan and create baseline
sentinel --write-baseline baseline.json

# Commit to version control
git add baseline.json
git commit -m "Add security baseline"

Using a Baseline

# Run with baseline filtering
sentinel --baseline baseline.json

Or configure in .sentinel.toml:

baseline_path = "sentinel-baseline.json"

Baseline Strategy

Baseline low/medium findings during initial adoption
Gradually fix baselined issues over time
Never baseline high/critical findings (they always fail)
Review baseline changes in pull requests

Tool-Specific Notes

Bandit

Scans Python code for common security issues:

# Runs: bandit -r . -f json --quiet

Default exclusions: .venv, .git, __pycache__, etc.

pip-audit

Scans Python dependencies for known vulnerabilities:

# Runs: pip-audit -r requirements.txt -f json

Looks for requirements.txt or requirements-dev.txt.

Semgrep

Pattern-based static analysis:

# Runs: semgrep scan --config=auto --json --quiet

Uses Semgrep Registry rules (requires network access).

CI/CD Integration

GitHub Actions

name: Security Gate

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install tools
        run: |
          pip install sentinel-pipeline
          pip install bandit pip-audit semgrep

      - name: Run Sentinel
        run: sentinel --baseline sentinel-baseline.json

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: security-report
          path: sentinel-report.json

GitLab CI

security_gate:
  stage: test
  image: python:3.11
  before_script:
    - pip install sentinel-pipeline bandit pip-audit semgrep
  script:
    - sentinel --baseline sentinel-baseline.json
  artifacts:
    when: always
    paths:
      - sentinel-report.json

Development

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=sentinel

Code Quality

# Format code
ruff format .

# Lint code
ruff check .

Architecture

See docs/architecture.md for detailed architecture documentation.

Key Design Principles

JSON as Source of Truth: All decisions derive from structured output
Fail Loudly: Tool errors become high-severity findings
Protocol-Based Runners: Easy to add new scanners
Deterministic Fingerprints: Enable reproducible baselining
Zero Network Calls: All operations are local (except tools themselves)

Documentation

Architecture: System design and components
Design Decisions: Rationale for key choices
Threat Model: Security considerations

Contributing

Contributions welcome! Please:

Add tests for new features
Update documentation
Follow existing code style
Ensure all tests pass

License

MIT License - see LICENSE file for details

Changelog

v0.3.0 (Current)

Added ML-based false positive prediction system
Heuristic scoring with 25+ features (no dependencies required)
Optional trained model support using logistic regression
Explainable predictions with feature contributions
ML scoring integration in CLI and reports
Training infrastructure with example data
Comprehensive ML documentation
Support for running tools as Python modules when not in PATH
Enhanced tool validation to check both PATH and module execution

v0.2.0

Added configuration file support (.sentinel.toml)
Added tool availability validation
Added path validation in CLI
Made exclusions configurable
Added structured logging system
Improved error messages with better context
Added comprehensive test coverage (53 tests)
Fixed duplicate code in semgrep runner
Completed architecture and threat model documentation

v0.1.0

Initial release
Basic scanner integration (Bandit, pip-audit, Semgrep)
Baseline management
JSON reports
Policy-based exit codes

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
sentinel		sentinel
tests		tests
.gitignore		.gitignore
.sentinel.toml.example		.sentinel.toml.example
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Sentinel Pipeline

Features

ML-Based False Positive Reduction

Quick Start with ML

How It Works

Training Custom Models

Configuration

Benefits

Installation

Install Sentinel

Install External Tools

Quick Start

Basic Usage

Baseline Workflow

Configuration

Configuration File

Command-Line Options

Exit Codes

Reports

JSON Report Structure

Console Output

Baseline Management

How Baselines Work

Creating a Baseline

Using a Baseline

Baseline Strategy

Tool-Specific Notes

Bandit

pip-audit

Semgrep

CI/CD Integration

GitHub Actions

GitLab CI

Development

Running Tests

Code Quality

Architecture

Key Design Principles

Documentation

Contributing

License

Changelog

v0.3.0 (Current)

v0.2.0

v0.1.0

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages