Skip to content

Investigation Scripts Data Collection Analysis Tools

elb-pr edited this page Apr 7, 2026 · 2 revisions

Investigation Scripts: Data Collection & Analysis Tools

This page provides an overview of the 15+ purpose-built Python scripts located in skills/claude-sleuth/scripts/. These tools automate the technical heavy lifting of an investigation, from initial domain reconnaissance and identity enumeration to complex graph analysis and forensic evidence preservation.

The toolkit is designed to be modular, with dependencies managed by setup.py and execution orchestrated by task_runner.py.

Script-to-Phase Mapping

Scripts are mapped to specific steps in the 6-phase intelligence cycle. This ensures that advanced analytical tools (like PageRank analysis) are only used after the necessary data (POLE entities) has been collected and resolved.

Phase Step Primary Scripts
Phase 2: Collection 3, 5 source_grader.py, evidence_preservation.py, content_archiver.py
Phase 3: Collation 6, 7, 8 entity_resolver.py, corporate_intel.py, domain_intel.py, username_enum.py, sanctions_screen.py
Phase 4: Processing 9, 10, 11 chronological_matrix.py, network_graph.py, geolocation.py
Phase 6: Reporting 13, 14, 15 report_generator.py, financial_analysis.py

System-to-Code Entity Bridge: Collection & Resolution

The following diagram illustrates how natural language investigative requests translate into specific Python classes and functions within the script directory.

Data Ingestion and Entity Linkage Flow

graph TD
    subgraph "Natural Language Space"
        A["'Verify this source'"]
        B["'Link these people'"]
        C["'Check social media'"]
    end

    subgraph "Code Entity Space (scripts/)"
        A --> D["source_grader.py"]
        B --> E["entity_resolver.py"]
        C --> F["username_enum.py"]

        D --> D1["AdmiraltyGrader class"]
        E --> E1["FellegiSunterMatcher"]
        F --> F1["MaigretScanner"]
    end

    subgraph "Data Storage (CSDb)"
        D1 --> G[("source_grades table")]
        E1 --> H[("entities table")]
    end
Loading

4.1 Identity, Social & Network Intelligence Scripts

These scripts focus on the "Person" and "Technical" aspects of the STEEPLES framework. They facilitate wide-scale reconnaissance across social platforms and network infrastructure without requiring complex manual searching.

  • username_enum.py: Integrates with Maigret and Sherlock to scan 3,000+ sites.
  • domain_intel.py: Performs DNS, RDAP, and Shodan lookups to map infrastructure.
  • geolocation.py: Extracts EXIF data and performs coordinate mapping for media authentication.

For details, see Identity, Social & Network Intelligence Scripts.

4.2 Corporate, Financial & Sanctions Intelligence Scripts

Used primarily in Step 8, these scripts interface with regulatory and financial databases to uncover corporate structures and compliance risks.

  • corporate_intel.py: Aggregates data from UK Companies House, SEC EDGAR, and GLEIF.
  • sanctions_screen.py: Uses fuzzy matching to check entities against OFAC and UK HMT lists.
  • financial_analysis.py: Processes ledger data for Phase 6 financial summaries.

For details, see Corporate, Financial & Sanctions Intelligence Scripts.

4.3 Entity Resolution, Graph Analysis & Chronological Matrix Scripts

These scripts transform raw collection data into structured intelligence by identifying relationships and temporal patterns.

  • entity_resolver.py: Implements the Fellegi-Sunter probabilistic framework to deduplicate the Entity Register.
  • network_graph.py: Utilizes NetworkX to calculate PageRank and centrality within POLE relationship DiGraphs.
  • chronological_matrix.py: Normalizes all event data to UTC and identifies timeline gaps.

Relational Processing Pipeline

graph LR
    subgraph "Input: POLE Records"
        P["Person"]
        O["Object"]
        L["Location"]
        E["Event"]
    end

    subgraph "Processing: network_graph.py"
        P -- "owns" --> O
        P -- "located_at" --> L
        P -- "involved_in" --> E
        
        subgraph "Metrics"
            D["DiGraph.in_degree()"]
            PR["PageRank"]
        end
    end

    subgraph "Output: Intelligence"
        G["Influence Mapping"]
        H["Community Detection"]
    end
Loading

For details, see Entity Resolution, Graph Analysis & Chronological Matrix Scripts.

4.4 Evidence Preservation, Content Archiving & Report Generation Scripts

The final tier of scripts ensures that all findings are forensically sound and professionally presented.

  • evidence_preservation.py: Automates SHA-256 hashing and Wayback Machine submission.
  • source_grader.py: Enforces the Admiralty 6x6 grading standard for every piece of evidence.
  • report_generator.py: Compiles CSDb data into ICD 203-compliant briefings using Jinja2 templates.

For details, see Evidence Preservation, Content Archiving & Report Generation Scripts.


Clone this wiki locally