-
Notifications
You must be signed in to change notification settings - Fork 4
Investigation Scripts Data Collection Analysis Tools
This page provides an overview of the 15+ purpose-built Python scripts located in skills/claude-sleuth/scripts/. These tools automate the technical heavy lifting of an investigation, from initial domain reconnaissance and identity enumeration to complex graph analysis and forensic evidence preservation.
The toolkit is designed to be modular, with dependencies managed by setup.py and execution orchestrated by task_runner.py.
Scripts are mapped to specific steps in the 6-phase intelligence cycle. This ensures that advanced analytical tools (like PageRank analysis) are only used after the necessary data (POLE entities) has been collected and resolved.
| Phase | Step | Primary Scripts |
|---|---|---|
| Phase 2: Collection | 3, 5 |
source_grader.py, evidence_preservation.py, content_archiver.py
|
| Phase 3: Collation | 6, 7, 8 |
entity_resolver.py, corporate_intel.py, domain_intel.py, username_enum.py, sanctions_screen.py
|
| Phase 4: Processing | 9, 10, 11 |
chronological_matrix.py, network_graph.py, geolocation.py
|
| Phase 6: Reporting | 13, 14, 15 |
report_generator.py, financial_analysis.py
|
The following diagram illustrates how natural language investigative requests translate into specific Python classes and functions within the script directory.
Data Ingestion and Entity Linkage Flow
graph TD
subgraph "Natural Language Space"
A["'Verify this source'"]
B["'Link these people'"]
C["'Check social media'"]
end
subgraph "Code Entity Space (scripts/)"
A --> D["source_grader.py"]
B --> E["entity_resolver.py"]
C --> F["username_enum.py"]
D --> D1["AdmiraltyGrader class"]
E --> E1["FellegiSunterMatcher"]
F --> F1["MaigretScanner"]
end
subgraph "Data Storage (CSDb)"
D1 --> G[("source_grades table")]
E1 --> H[("entities table")]
end
These scripts focus on the "Person" and "Technical" aspects of the STEEPLES framework. They facilitate wide-scale reconnaissance across social platforms and network infrastructure without requiring complex manual searching.
-
username_enum.py: Integrates with Maigret and Sherlock to scan 3,000+ sites. -
domain_intel.py: Performs DNS, RDAP, and Shodan lookups to map infrastructure. -
geolocation.py: Extracts EXIF data and performs coordinate mapping for media authentication.
For details, see Identity, Social & Network Intelligence Scripts.
Used primarily in Step 8, these scripts interface with regulatory and financial databases to uncover corporate structures and compliance risks.
-
corporate_intel.py: Aggregates data from UK Companies House, SEC EDGAR, and GLEIF. -
sanctions_screen.py: Uses fuzzy matching to check entities against OFAC and UK HMT lists. -
financial_analysis.py: Processes ledger data for Phase 6 financial summaries.
For details, see Corporate, Financial & Sanctions Intelligence Scripts.
These scripts transform raw collection data into structured intelligence by identifying relationships and temporal patterns.
-
entity_resolver.py: Implements the Fellegi-Sunter probabilistic framework to deduplicate the Entity Register. -
network_graph.py: Utilizes NetworkX to calculate PageRank and centrality within POLE relationship DiGraphs. -
chronological_matrix.py: Normalizes all event data to UTC and identifies timeline gaps.
Relational Processing Pipeline
graph LR
subgraph "Input: POLE Records"
P["Person"]
O["Object"]
L["Location"]
E["Event"]
end
subgraph "Processing: network_graph.py"
P -- "owns" --> O
P -- "located_at" --> L
P -- "involved_in" --> E
subgraph "Metrics"
D["DiGraph.in_degree()"]
PR["PageRank"]
end
end
subgraph "Output: Intelligence"
G["Influence Mapping"]
H["Community Detection"]
end
For details, see Entity Resolution, Graph Analysis & Chronological Matrix Scripts.
The final tier of scripts ensures that all findings are forensically sound and professionally presented.
-
evidence_preservation.py: Automates SHA-256 hashing and Wayback Machine submission. -
source_grader.py: Enforces the Admiralty 6x6 grading standard for every piece of evidence. -
report_generator.py: Compiles CSDb data into ICD 203-compliant briefings using Jinja2 templates.
For details, see Evidence Preservation, Content Archiving & Report Generation Scripts.