Skip to content

Phase 2A: Add 6 new data sources (ACLED, ADS-B, GDELT, InSight Crime, OpenSanctions, SpiderFoot, Unusual Whales)#2

Open
devin-ai-integration[bot] wants to merge 21 commits intodevin/1775767650-sync-upstream-masterfrom
devin/1775767691-phase2a-data-sources
Open

Phase 2A: Add 6 new data sources (ACLED, ADS-B, GDELT, InSight Crime, OpenSanctions, SpiderFoot, Unusual Whales)#2
devin-ai-integration[bot] wants to merge 21 commits intodevin/1775767650-sync-upstream-masterfrom
devin/1775767691-phase2a-data-sources

Conversation

@devin-ai-integration
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot commented Apr 9, 2026

Summary

Syncs the Phase 2A data source expansion from ltmsuma/CrucixDEVIN PR #1 into the COG-GTM fork. This adds 3 new source modules, enhances 4 existing sources, and adds corresponding dashboard panels (Narco & Organized Crime, Market Intelligence, PRC Tension Index).

Depends on PR #1 (upstream master sync) — must be merged first.

21 commits across 12 files (+1,629 / -130 lines)

Why

The upstream repo added a Phase 2A data expansion that significantly broadens intelligence coverage. This sync brings those capabilities into the COG-GTM fork so we stay at full parity.

Scope

  • New source
  • Dashboard change
  • Docs/config change

What Changed

New Sources

Source File API Key? Description
InSight Crime apis/sources/insightcrime.mjs No Central America organized crime RSS feed; cross-references with OpenSanctions
SpiderFoot apis/sources/spiderfoot.mjs SPIDERFOOT_API_KEY OSINT reconnaissance (IP, domain, email enrichment)
Unusual Whales apis/sources/unusualwhales.mjs UNUSUAL_WHALES_API_KEY Options flow, congressional trades, dark pool prints >$5M

Enhanced Existing Sources

Source Key Changes
ACLED New API key auth method, API URL changed to api.acleddata.com, FLASH alerts for events with >50 fatalities
ADS-B opendata.adsb.fi as free primary (no key), sensitive region detection (7 regions), PRIORITY alerts for military over hotspots
GDELT Rate-limit resilience (4 retries, 5s stagger delay, 60s timeout), tone scores derived from article data, parenthesized OR queries
OpenSanctions Handles totalResults as object, guards against collections 404

Dashboard

  • Narco & Organized Crime panel — InSight Crime headlines with sanctions cross-reference badges
  • Market Intelligence panel — Unusual Whales options flow, congressional trades (defense/energy filtered), dark pool prints
  • PRC Tension Index — Composite score in Signal Core (air activity + GDELT mentions + ISR flights)
  • Globe markers — Red markers for PRC-related intelligence, Asia Pacific default camera view
  • Loading behavior — Server mode now always fetches live API data, renders inline data as placeholder while loading

Delta Engine

  • ACLED FLASH alerts (fatalities >50) → critical severity
  • ADS-B PRIORITY alerts (military over sensitive regions)
  • GDELT tone deterioration alerts
  • InSight Crime sanctions cross-ref alerts
  • Unusual Whales options flow >$5M alerts
  • Defense committee cross-reference: FLASH alert when a defense committee member trades within 48h of a geopolitical event

Validation

  • Verified clean fast-forward merge from upstream PR branch (no conflicts)
  • git merge-tree confirms zero conflict markers against COG-GTM master
  • Not tested locally — this is a direct sync of upstream's already-iterated code (21 commits including 14 bug-fix iterations)

Config and Docs

  • .env.example — not updated in this PR (new env vars SPIDERFOOT_API_KEY, UNUSUAL_WHALES_API_KEY, ACLED_API_KEY should be documented; may need a follow-up)
  • Source count in briefing.mjs updated (29 → 30, though actual count is higher)

Source Additions

InSight Crime: Adds Central American organized crime coverage — a region not covered by existing sources. No API key required (public RSS). Degrades to empty result on fetch failure. Cross-references entities against OpenSanctions for sanctions overlap. Wired into briefing.mjs as Tier 7 and inject.mjs for the Narco panel.

SpiderFoot: Adds OSINT enrichment (IP/domain/email reconnaissance). Requires SPIDERFOOT_API_KEY; returns stub when missing. Wired into briefing.mjs as Tier 7.

Unusual Whales: Adds market intelligence — options flow anomalies, congressional trading (filtered to defense/energy), dark pool large prints. Requires UNUSUAL_WHALES_API_KEY; degrades gracefully. Wired into briefing.mjs as Tier 8 and inject.mjs for the Market Intelligence panel.

Human Review Checklist

  • innerHTML security — New dashboard panels (Narco, Market Intelligence) render fetched data via innerHTML with cleanText(). Verify cleanText() adequately sanitizes all injected content per CONTRIBUTING.md requirements.
  • ACLED API URL change — Endpoint changed from acleddata.com/api/acled/read to api.acleddata.com/acled/read. Confirm this is the correct current ACLED API endpoint.
  • PRC Tension Index scoring — Hardcoded weights (air * 1.5 + GDELT * 8 + ISR * 15, max 100). Review whether these weights produce meaningful results or could be misleading.
  • Defense committee cross-reference logic (lib/delta/engine.mjs) — Generates FLASH alerts when defense committee members trade within 48h of any geopolitical event. Review for false positive risk and whether the correlation logic is sound.
  • jarvis.html loading changecanProbeApi block now always fetches from /api/data even when inline data exists. Verify this doesn't cause flash-of-content or data race issues.
  • GDELT fragility — 14 of 21 commits are iterative GDELT bug fixes (rate limiting, timeouts, query syntax). The 60s slow-source timeout is a workaround. Consider whether GDELT reliability is sufficient.
  • Source count mismatchbriefing.mjs says "30 sources" but this PR adds 3 new sources to the existing set. Verify the actual count is correct.
  • Missing .env.example updates — New env vars (ACLED_API_KEY, SPIDERFOOT_API_KEY, UNUSUAL_WHALES_API_KEY) are referenced in source code but .env.example is not updated in this PR.

Checklist

  • This PR stays within one bugfix or one feature family (Phase 2A sources)
  • I kept unrelated changes out of the diff
  • I considered security for any mixed-source content rendering — flagged for review above
  • I tested the changed path locally — not tested; synced from upstream

Link to Devin session: https://app.devin.ai/sessions/a046d43371284e3f8175cb76185f3ced
Requested by: @jakexcosme


Open with Devin

devin-ai-integration bot and others added 21 commits April 7, 2026 23:30
… SpiderFoot, InSight Crime)

- ACLED: Updated auth to support API key (ACLED_API_KEY), added FLASH alert for fatalities > 50
- ADS-B: Switched to free opendata.adsb.fi endpoint, added military aircraft markers (ISR, bombers, tankers, VIP), PRIORITY alerts for sensitive regions
- OpenSanctions: Added API key support, cross-referencing logic, entity matching
- GDELT: Full integration with regional tone scoring, geographic event clustering, PRIORITY alerts for tone deterioration
- SpiderFoot: Docker container integration, scan management, findings panel
- InSight Crime: RSS feed integration (main + Mexico/Colombia/Central America), entity extraction, OpenSanctions cross-referencing, PRIORITY alerts
- Dashboard: New globe markers for military aircraft, GDELT clusters
- Delta engine: Added alert evaluation for all new sources
- News feed: Merged InSight Crime articles into ticker

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…anctions 401, safeFetch truncation

- InSight Crime: Changed /category/ to /tag/ for Mexico/Colombia feeds, removed Central America (404)
- GDELT: Reduced tone scoring to 3 regions and shortened delay to 3s to avoid 30s timeout
- OpenSanctions: Handle 401 gracefully when no API key provided
- safeFetch: Remove 500-char truncation on rawText (broke RSS parsing)

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
- SpiderFoot: Use /scanlist, /scaneventresults, /scanstatus, /scansummary (not /api?func=)
- GDELT: Reduce tone scoring to 2 regions, shorten delay to 2s to stay under 30s timeout

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…calls

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…PI calls

Removes sequential rate-limited tone API calls that caused 60s+ timeouts.
Instead, computes regional tone scores from article-level tone data already
fetched in the initial query. Reverts SLOW_SOURCE_TIMEOUT_MS back to standard 30s.

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
- Article fetch timeout: 20s (was 15s default) with 0 retries
- Geo fetch timeout: 15s with 0 retries
- Removed unused toneTrend/volumeTrend functions
- Reduced inter-request delay to 500ms

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
GDELT API rejects queries with OR terms unless wrapped in parentheses.
Added auto-parenthesization in searchEvents() and geoEvents().

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…a.com)

api.acleddata.com doesn't resolve. The correct ACLED API endpoint is
https://acleddata.com/api/acled/read per their official docs.

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
- Add retry with 6s delay when GDELT returns rate-limit text response
- Increase inter-request delay to 6s (GDELT requires 5s between requests)
- Give GDELT 60s source timeout to accommodate retry + geo query
- Fixes intermittent 0-article results from rate limiting

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…ne data)

GDELT ArtList mode doesn't include tone scores in responses.
Changed regional monitoring to track article coverage counts.
PRIORITY alerts now fire on high coverage (10+ articles) per region.

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
- Add 3s initial delay to avoid rate-limit collisions with concurrent sources
- Increase retry count from 2 to 3 for resilience against rate limiting
- GDELT enforces 1 request per 5 seconds; concurrent sweeps can trigger this

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…age)

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…es to 4

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…nSight Crime to news feed sources

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…ry for InSight Crime

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…e D)

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…rades, dark pool

- New collector: apis/sources/unusualwhales.mjs (3 endpoints, 15min refresh)
- Market Intelligence panel in dashboard with top 10 options flow, congressional trades (defense/energy/aerospace), dark pool >M
- PRIORITY alert: options flow >M in defense/energy/maritime stocks
- FLASH alert: defense committee member trades within 48h of geopolitical alert
- Financial event globe markers at company HQ locations (cyan)
- Sensor Grid entry for Market Intel with live counts
- Map legend updated with Market Intel marker

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…rid, tension index, red markers

- Default globe view changed from WORLD to ASIA PACIFIC (lat 20, lon 118)
- PRC Intelligence section in Sensor Grid: PLAN Vessels, Taiwan ADIZ, SCS Activity, PRC State Media
- PRC WATCH status indicator in topbar alongside existing WARTIME STAGFLATION RISK
- Red markers for PRC-attributed activity (Chinese aircraft in Taiwan Strait/SCS)
- PRC military aircraft from ADS-B rendered as large red markers
- Strategic labels: PRC WATCH ZONE, TAIWAN ADIZ, SOUTH CHINA SEA
- PRC Tension Index gauge in Signal Core panel (score 0-100, REDUCED/NORMAL/HEIGHTENED/ELEVATED)
- Map legend updated with PRC Activity entry (red)
- CSS for PRC gauge bar and indicator dot

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
…limited

Root cause: D.air regions depended entirely on OpenSky hotspots, which returns
HTTP 429 (rate limited). ADS-B had 331 military aircraft with valid coordinates
but only fed D.adsbMilitary, never D.air.

Fix:
- Add buildAirHotspotsFromADSB() in inject.mjs that counts ADS-B military
  aircraft into the same 10 geographic regions as OpenSky
- Use ADS-B data as automatic fallback when OpenSky is unavailable
- Increase militaryAircraft cap from 50 to 200 for better regional coverage
- Update airMeta to show 'ADS-B Military' as source when fallback is active

Co-Authored-By: Matt Suma <matthew.suma@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Author

Original prompt from Jake

Check our repo against this version: ltmsuma#1 - let me know how far behind we are and what we need to change in order to make our fork in COG-GTM up to date as /ltmsuma/ - come up with a plan for this

@devin-ai-integration
Copy link
Copy Markdown
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR that start with 'DevinAI' or '@devin'.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 5 potential issues.

View 7 additional findings in Devin Review.

Open in Devin Review

Comment thread lib/delta/engine.mjs
Comment on lines +201 to +297
// ─── ACLED FLASH alerts (fatalities > 50 in a single event) ─────────

const acledFlash = current.acled?.flashAlerts || [];
for (const alert of acledFlash) {
signals.new.push({
key: `acled_flash:${alert.headline}`,
reason: alert.headline,
detail: alert.detail,
severity: 'critical',
tier: 'FLASH',
});
criticalChanges += 3;
}

// ─── ADS-B PRIORITY alerts (military over sensitive regions) ───────

const adsbAlerts = current.adsbMilitary?.priorityAlerts || [];
for (const alert of adsbAlerts.slice(0, 5)) {
signals.new.push({
key: `adsb_priority:${alert.headline}`,
reason: alert.headline,
detail: alert.detail,
severity: 'high',
tier: 'PRIORITY',
});
criticalChanges++;
}

// ─── GDELT tone deterioration PRIORITY alerts ─────────────────────

const gdeltAlerts = current.gdelt?.priorityAlerts || [];
for (const alert of gdeltAlerts) {
signals.new.push({
key: `gdelt_tone:${alert.headline}`,
reason: alert.headline,
detail: alert.detail,
severity: 'high',
tier: 'PRIORITY',
});
criticalChanges++;
}

// ─── InSight Crime sanctions cross-ref PRIORITY alerts ────────────

const icAlerts = current.insightCrime?.priorityAlerts || [];
for (const alert of icAlerts) {
signals.new.push({
key: `ic_sanctions:${alert.headline}`,
reason: alert.headline,
detail: alert.detail,
severity: 'high',
tier: 'PRIORITY',
});
criticalChanges++;
}

// ─── Unusual Whales PRIORITY alerts (options flow >$5M) ─────────────

const uwAlerts = current.unusualWhales?.priorityAlerts || [];
for (const alert of uwAlerts) {
signals.new.push({
key: `uw_options:${alert.ticker}`,
reason: alert.headline,
detail: alert.detail,
severity: 'high',
tier: 'PRIORITY',
});
criticalChanges++;
}

// ─── Unusual Whales FLASH alerts (defense committee + geopolitical) ──

const defenseCommitteeTrades = current.unusualWhales?.congressTrades?.defenseCommittee || [];
const geopoliticalAlerts = [
...(current.gdelt?.priorityAlerts || []),
...(current.acled?.flashAlerts || []),
...(current.adsbMilitary?.priorityAlerts || []),
];

if (geopoliticalAlerts.length > 0) {
const now = new Date();
const last48h = new Date(now.getTime() - 48 * 60 * 60 * 1000);

for (const trade of defenseCommitteeTrades) {
const tradeDate = new Date(trade.transactionDate);
if (tradeDate >= last48h) {
signals.new.push({
key: `uw_flash:${trade.name}:${trade.ticker}`,
reason: `DEFENSE COMMITTEE TRADE: ${trade.name} traded ${trade.ticker} within 48h of geopolitical alert`,
detail: `${trade.txnType} ${trade.amounts} on ${trade.transactionDate}`,
severity: 'critical',
tier: 'FLASH',
});
criticalChanges += 3;
}
}
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Delta engine alerts lack dedup — persistent alerts re-reported as "new" every sweep, inflating criticalChanges

All new alert types (ACLED flash, ADS-B priority, GDELT tone, InSight Crime sanctions, Unusual Whales options, Unusual Whales FLASH) are unconditionally pushed to signals.new without any dedup against the previous run. By contrast, the existing Telegram dedup (lines 183-199) correctly checks prevHashes before adding to signals.new. A mass casualty event (fatalities > 50) persists in ACLED data for days, so every sweep will re-add criticalChanges += 3 for the same event. Similarly, a military aircraft over a sensitive region will increment criticalChanges every ~15 minutes as long as it's airborne. This inflated count is consumed by the dashboard (jarvis.html:1885), Telegram alerts (server.mjs:103), and LLM idea generation (lib/llm/ideas.mjs:147), misrepresenting the situation as continuously escalating. Additionally, _compactForStorage in lib/delta/memory.mjs:196-221 does not preserve flashAlerts, priorityAlerts, adsbMilitary, insightCrime, or unusualWhales, so even if dedup were added, the previous run data wouldn't contain these fields to compare against.

Prompt for agents
The new alert types (ACLED flash, ADS-B priority, GDELT tone, InSight Crime sanctions, Unusual Whales) in computeDelta() are pushed unconditionally to signals.new without dedup against prior runs. This is inconsistent with the Telegram dedup pattern at lines 183-199 which builds prevHashes from prior runs.

Two changes are needed:

1. In lib/delta/engine.mjs: For each new alert category, build a set of prior alert keys from the previous run(s) and only push to signals.new if the key is not already present. For example, collect previous acled flash headlines, previous adsb priority headlines, etc. from the previous/priorRuns data and skip alerts whose key already appeared.

2. In lib/delta/memory.mjs _compactForStorage(): Preserve the alert fields needed for dedup in subsequent runs. Add at minimum: acled.flashAlerts (array of headlines), adsbMilitary.priorityAlerts, gdelt.priorityAlerts, insightCrime.priorityAlerts, unusualWhales.priorityAlerts, and unusualWhales.congressTrades.defenseCommittee. Without storing these, the dedup in step 1 would have nothing to compare against in prior runs.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid finding — all new alert types are pushed unconditionally to signals.new without dedup, unlike the Telegram dedup pattern. And _compactForStorage doesn't preserve the alert fields needed for comparison.

Since this is a direct sync of upstream code from ltmsuma/CrucixDEVIN PR #1, the fix should be a follow-up PR. Two changes needed:

  1. In engine.mjs: Build prior alert key sets from previous runs and skip duplicates
  2. In memory.mjs: Preserve acled.flashAlerts, adsbMilitary.priorityAlerts, gdelt.priorityAlerts, insightCrime.priorityAlerts, unusualWhales.priorityAlerts, and unusualWhales.congressTrades.defenseCommittee

Happy to implement this as a follow-up if requested.

Comment thread apis/briefing.mjs

export async function fullBriefing() {
console.error('[Crucix] Starting intelligence sweep — 29 sources...');
console.error('[Crucix] Starting intelligence sweep — 30 sources...');
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Source count log message says 30 but there are 32 sources

The fullBriefing() function logs "Starting intelligence sweep — 30 sources" but the allPromises array contains 32 entries (29 original + SpiderFoot, InSightCrime, UnusualWhales). The computed sourcesQueried in the output object is correct (derived from sources.length), but the startup log is misleading.

Suggested change
console.error('[Crucix] Starting intelligence sweep — 30 sources...');
console.error('[Crucix] Starting intelligence sweep — 32 sources...');
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — the log says "30 sources" but allPromises actually has 32 entries after this PR adds SpiderFoot, InSight Crime, and Unusual Whales. The runtime sourcesQueried count is correct, but the startup log is misleading. Upstream bug — can fix in a follow-up.

}

export async function briefing() {
const apiKey = process.env.UNUSUAL_WHALES_API_KEY;
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 New environment variables not documented in .env.example, violating CONTRIBUTING.md

CONTRIBUTING.md mandates: "document any required environment variables in .env.example and README.md". This PR introduces four new environment variables — ACLED_API_KEY, OPENSANCTIONS_API_KEY, UNUSUAL_WHALES_API_KEY, and SPIDERFOOT_URL — but neither .env.example nor README.md were updated. This is a violation of the mandatory contributing rules.

Prompt for agents
CONTRIBUTING.md requires that new environment variables are documented in both .env.example and README.md.

The following new env vars need to be added to .env.example:
- ACLED_API_KEY: Used in apis/sources/acled.mjs as the preferred auth method (set alongside ACLED_EMAIL)
- OPENSANCTIONS_API_KEY: Required for opensanctions.mjs search endpoint (free key at opensanctions.org/api)
- UNUSUAL_WHALES_API_KEY: Required for unusualwhales.mjs options flow, congressional trades, dark pool data
- SPIDERFOOT_URL: Optional, defaults to http://localhost:5001, used by apis/sources/spiderfoot.mjs

Also update README.md with setup instructions for these new sources.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is a valid gap. Four new env vars (ACLED_API_KEY, OPENSANCTIONS_API_KEY, UNUSUAL_WHALES_API_KEY, SPIDERFOOT_URL) are used in the source code but not documented in .env.example or README.md. This is inherited from the upstream PR. Can fix in a follow-up.

Comment on lines +119 to +129
try {
sanctionsHits = await crossReference(entityList);
// PRIORITY alerts for sanctions matches
priorityAlerts = sanctionsHits.map(hit => ({
tier: 'PRIORITY',
headline: `SANCTIONS MATCH: "${hit.name}" found in InSight Crime + OpenSanctions`,
detail: `Matched: ${hit.matches.map(m => m.caption).join(', ')} (datasets: ${hit.matches.flatMap(m => m.datasets || []).slice(0, 3).join(', ')})`,
}));
} catch (e) {
// Cross-referencing is best-effort
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 crossReference makes dozens of failing API calls when OPENSANCTIONS_API_KEY is not set

insightcrime.mjs:120 calls crossReference(entityList) which invokes matchEntity for up to 30 entities. Each matchEntity call goes through safeFetch with default retries: 1, hitting the OpenSanctions API. Without OPENSANCTIONS_API_KEY, the API returns 401 and safeFetch retries each call once (with 2s delay). This wastes ~6-12 seconds of the source's 30-second timeout budget on requests guaranteed to fail. The opensanctions.mjs briefing() function at opensanctions.mjs:127 has a proper guard for missing API key, but crossReference() does not, and insightcrime.mjs calls it directly without checking.

Suggested change
try {
sanctionsHits = await crossReference(entityList);
// PRIORITY alerts for sanctions matches
priorityAlerts = sanctionsHits.map(hit => ({
tier: 'PRIORITY',
headline: `SANCTIONS MATCH: "${hit.name}" found in InSight Crime + OpenSanctions`,
detail: `Matched: ${hit.matches.map(m => m.caption).join(', ')} (datasets: ${hit.matches.flatMap(m => m.datasets || []).slice(0, 3).join(', ')})`,
}));
} catch (e) {
// Cross-referencing is best-effort
}
if (process.env.OPENSANCTIONS_API_KEY) {
try {
sanctionsHits = await crossReference(entityList);
// PRIORITY alerts for sanctions matches
priorityAlerts = sanctionsHits.map(hit => ({
tier: 'PRIORITY',
headline: `SANCTIONS MATCH: "${hit.name}" found in InSight Crime + OpenSanctions`,
detail: `Matched: ${hit.matches.map(m => m.caption).join(', ')} (datasets: ${hit.matches.flatMap(m => m.datasets || []).slice(0, 3).join(', ')})`,
}));
} catch (e) {
// Cross-referencing is best-effort
}
}
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid — crossReference() doesn't guard on OPENSANCTIONS_API_KEY, so it fires up to 30 safeFetch calls that all 401 with retries, wasting 6-12 seconds. The suggested fix of wrapping the call in if (process.env.OPENSANCTIONS_API_KEY) would prevent this. Upstream code — can fix in a follow-up.

Comment thread apis/sources/gdelt.mjs
Comment on lines +97 to +98
cluster.lat = (cluster.lat + points[j].lat) / 2;
cluster.lon = (cluster.lon + points[j].lon) / 2;
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Geo-clustering centroid drifts toward later points instead of computing true average

In clusterGeoPoints, the centroid update cluster.lat = (cluster.lat + points[j].lat) / 2 gives equal weight to the current centroid and the new point regardless of cluster size. With points at latitudes [0, 0, 90], the centroid after all three is 45 instead of the correct 30. This biases centroids toward later-added points, and because subsequent point-to-cluster distance checks use the drifted centroid, it can cause incorrect cluster assignments — points that should be grouped together may end up in separate clusters.

Suggested change
cluster.lat = (cluster.lat + points[j].lat) / 2;
cluster.lon = (cluster.lon + points[j].lon) / 2;
// Update centroid as weighted average
const n = cluster.points.length;
cluster.lat = (cluster.lat * (n - 1) + points[j].lat) / n;
cluster.lon = (cluster.lon * (n - 1) + points[j].lon) / n;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the centroid drift. The running average (cluster.lat + points[j].lat) / 2 gives equal weight to the centroid and the new point regardless of cluster size. The suggested fix using (cluster.lat * (n-1) + points[j].lat) / n would compute the true weighted average. This is upstream code — can fix in a follow-up.

devin-ai-integration bot added a commit that referenced this pull request Apr 13, 2026
…e, OpenSanctions, SpiderFoot, Unusual Whales)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant