Skip to content

Entity Resolution Graph Analysis Chronological Matrix Scripts

elb-pr edited this page Apr 7, 2026 · 2 revisions

Entity Resolution, Graph Analysis & Chronological Matrix Scripts

This section covers the core analytical scripts used in Phases 3 and 4 of the intelligence cycle. These tools facilitate the transition from raw collected data to structured intelligence by resolving duplicate entities, mapping complex relational networks, and constructing normalized timelines for gap analysis.

1. Entity Resolution (entity_resolver.py)

The EntityResolver class implements a hybrid approach to record linkage, combining deterministic matching with the Fellegi-Sunter probabilistic framework. It is designed to deduplicate POLE (Person, Object, Location, Event) entities across disparate data sources.

Deterministic vs. Probabilistic Matching

The resolver first attempts to find matches using a set of DETERMINISTIC_KEYS—unique identifiers that guarantee a match if they align. If no deterministic match is found, it calculates a probabilistic score based on weighted fields.

Field Weight Logic
name 0.40 Jaro-Winkler or Levenshtein similarity
dob 0.20 Exact match (1.0) or same year (0.5)
address 0.15 String similarity of normalized address strings
nationality 0.10 Binary match of normalized country strings

Clustering Implementation

The script uses a Union-Find (Disjoint Set Union) algorithm to group records into clusters once a match is confirmed either deterministically or by exceeding the match_threshold (default 0.80).

Entity Resolution Logic Flow

graph TD
    subgraph "Code Entity Space: EntityResolver"
        A["add_record()"] --> B{"_deterministic_match()"}
        B -- "Match Found" --> C["union(i, j)"]
        B -- "No Match" --> D{"_probabilistic_score()"}
        D -- "Score > threshold" --> C
        D -- "Score < threshold" --> E["New Cluster"]
        C --> F["resolve()"]
        E --> F
    end
    
    subgraph "Natural Language Space: POLE Deduplication"
        F --> G["Entity Clusters"]
        G --> H["Deduplicated Register"]
    end
Loading

2. Network Graph Analysis (network_graph.py)

The InvestigationGraph class utilizes networkx to build a directed graph (DiGraph) of investigative entities. The choice of a directed graph is intentional: it preserves the relationship directionality (e.g., Person A owns Company B), which is critical for calculating meaningful centrality metrics.

Data Model and Visualization

The script maps POLE entities to specific visual styles (colors and shapes) for export via pyvis.

  • Nodes: Supports types such as person, organisation, domain, email, and vehicle.
  • Edges: Defines relationship types like director_of, shareholder_of, alias_of, and financial_link.

Investigative Metrics

The centrality_report() function provides analytical insights into the network structure:

  1. In-Degree Centrality: Identifies high-value targets (entities pointed to by many).
  2. Out-Degree Centrality: Identifies connectors or aggregators.
  3. PageRank: Calculates recursive authority scores, useful for finding influential nodes in sparse graphs.
  4. Strongly Connected Components: Detects circular ownership or feedback loops.

Graph Construction and Export

graph LR
    subgraph "Code Entity Space: network_graph.py"
        A["InvestigationGraph"] --> B["add_person() / add_org()"]
        B --> C["add_edge(source, target)"]
        C --> D["centrality_report()"]
        C --> E["export_html()"]
    end

    subgraph "Analytical Output"
        D --> F["PageRank / Betweenness"]
        E --> G["Pyvis Interactive Map"]
    end
Loading

3. Chronological Matrix (chronological_matrix.py)

The ChronologicalMatrix script manages UTC-normalized timelines. It transforms various datetime formats into a standard TimelineEvent object to facilitate temporal analysis.

Normalization and Parsing

The parse_to_utc() function handles 12+ datetime formats, stripping "Z" suffixes and applying UTC offsets to ensure all events are comparable on a single linear timeline.

Intelligence Detection Features

The matrix includes two primary automated detection algorithms:

  • Gap Detection (detect_gaps): Identifies temporal windows where intelligence is missing (defaulting to gaps > 24 hours). These are flagged as INTELLIGENCE_UNKNOWN.
  • Conflict Detection (detect_conflicts): Identifies events from different sources that describe similar activities (via keyword overlap) but occur at conflicting times within a specified window (default 30 mins).

Timeline Event Schema

Attribute Description
utc_datetime ISO 8601 normalized timestamp skills/claude-sleuth/scripts/chronological_matrix.py:26
source_reliability Admiralty A-F scale skills/claude-sleuth/scripts/chronological_matrix.py:29
info_credibility Admiralty 1-6 scale skills/claude-sleuth/scripts/chronological_matrix.py:30
conflicts_with List of conflicting event_ids skills/claude-sleuth/scripts/chronological_matrix.py:36

Clone this wiki locally