-
Notifications
You must be signed in to change notification settings - Fork 4
Entity Resolution Graph Analysis Chronological Matrix Scripts
This section covers the core analytical scripts used in Phases 3 and 4 of the intelligence cycle. These tools facilitate the transition from raw collected data to structured intelligence by resolving duplicate entities, mapping complex relational networks, and constructing normalized timelines for gap analysis.
The EntityResolver class implements a hybrid approach to record linkage, combining deterministic matching with the Fellegi-Sunter probabilistic framework. It is designed to deduplicate POLE (Person, Object, Location, Event) entities across disparate data sources.
The resolver first attempts to find matches using a set of DETERMINISTIC_KEYS—unique identifiers that guarantee a match if they align. If no deterministic match is found, it calculates a probabilistic score based on weighted fields.
| Field | Weight | Logic |
|---|---|---|
name |
0.40 | Jaro-Winkler or Levenshtein similarity |
dob |
0.20 | Exact match (1.0) or same year (0.5) |
address |
0.15 | String similarity of normalized address strings |
nationality |
0.10 | Binary match of normalized country strings |
The script uses a Union-Find (Disjoint Set Union) algorithm to group records into clusters once a match is confirmed either deterministically or by exceeding the match_threshold (default 0.80).
Entity Resolution Logic Flow
graph TD
subgraph "Code Entity Space: EntityResolver"
A["add_record()"] --> B{"_deterministic_match()"}
B -- "Match Found" --> C["union(i, j)"]
B -- "No Match" --> D{"_probabilistic_score()"}
D -- "Score > threshold" --> C
D -- "Score < threshold" --> E["New Cluster"]
C --> F["resolve()"]
E --> F
end
subgraph "Natural Language Space: POLE Deduplication"
F --> G["Entity Clusters"]
G --> H["Deduplicated Register"]
end
The InvestigationGraph class utilizes networkx to build a directed graph (DiGraph) of investigative entities. The choice of a directed graph is intentional: it preserves the relationship directionality (e.g., Person A owns Company B), which is critical for calculating meaningful centrality metrics.
The script maps POLE entities to specific visual styles (colors and shapes) for export via pyvis.
-
Nodes: Supports types such as
person,organisation,domain,email, andvehicle. -
Edges: Defines relationship types like
director_of,shareholder_of,alias_of, andfinancial_link.
The centrality_report() function provides analytical insights into the network structure:
- In-Degree Centrality: Identifies high-value targets (entities pointed to by many).
- Out-Degree Centrality: Identifies connectors or aggregators.
- PageRank: Calculates recursive authority scores, useful for finding influential nodes in sparse graphs.
- Strongly Connected Components: Detects circular ownership or feedback loops.
Graph Construction and Export
graph LR
subgraph "Code Entity Space: network_graph.py"
A["InvestigationGraph"] --> B["add_person() / add_org()"]
B --> C["add_edge(source, target)"]
C --> D["centrality_report()"]
C --> E["export_html()"]
end
subgraph "Analytical Output"
D --> F["PageRank / Betweenness"]
E --> G["Pyvis Interactive Map"]
end
The ChronologicalMatrix script manages UTC-normalized timelines. It transforms various datetime formats into a standard TimelineEvent object to facilitate temporal analysis.
The parse_to_utc() function handles 12+ datetime formats, stripping "Z" suffixes and applying UTC offsets to ensure all events are comparable on a single linear timeline.
The matrix includes two primary automated detection algorithms:
-
Gap Detection (
detect_gaps): Identifies temporal windows where intelligence is missing (defaulting to gaps > 24 hours). These are flagged asINTELLIGENCE_UNKNOWN. -
Conflict Detection (
detect_conflicts): Identifies events from different sources that describe similar activities (via keyword overlap) but occur at conflicting times within a specified window (default 30 mins).
| Attribute | Description |
|---|---|
utc_datetime |
ISO 8601 normalized timestamp skills/claude-sleuth/scripts/chronological_matrix.py:26 |
source_reliability |
Admiralty A-F scale skills/claude-sleuth/scripts/chronological_matrix.py:29 |
info_credibility |
Admiralty 1-6 scale skills/claude-sleuth/scripts/chronological_matrix.py:30 |
conflicts_with |
List of conflicting event_ids skills/claude-sleuth/scripts/chronological_matrix.py:36
|