Scripts/retrieval/sql_tools.py is the deterministic numeric retrieval engine for the Silver layer.
Its job is to convert MetadataExtraction into auditable DuckDB outputs without LLM-side numeric fabrication risk.
Primary guarantees:
- metric routing is whitelist-constrained and ontology-aware,
- ticker matching is strict and never fuzzy,
- time windows are anchored to ingestion reality rather than wall-clock assumptions,
- percentage fields preserve source-native units,
- every SQL path emits typed lineage anchors, a structured
citation_contract, and audit logs, - schema drift is surfaced early through contract validation at startup.
Environment variables governing this module:
| Variable | Default | Description |
|---|---|---|
SILVER_MAX_TICKERS |
5 |
Maximum tickers processed per query; excess tickers are dropped and logged as TICKER_CAP |
IV_RANK_LOOKBACK_DAYS |
180 |
ATM IV percentile ranking window in calendar days |
DuckDB session configuration: in-memory database (:memory:), threads=4, memory_limit='2GB'.
flowchart TD
A[MasterRetriever] --> B[SilverSQLTool.query_parquet_by_metadata]
B --> C[Publish time_predicates for handlers]
C --> D[Partition supported and unsupported metrics]
D --> E[Enforce ticker cap and strict ticker normalization]
E --> F[Dispatch to metric handler]
F --> G[DuckDB query over Silver Parquet]
G --> H[Assemble values + lineage anchors + status]
H --> I[Write SQL audit log]
flowchart LR
A[MetadataExtraction] --> B[metric canonicalization]
B --> C[handler dispatch]
C --> D{handler family}
D -->|options| E[options SQL]
D -->|macro| F[macro SQL]
D -->|gpr| G[gpr SQL]
E --> H[values + anchors]
F --> H
G --> H
H --> I[status payload]
| Handler | Core Logic | Key Outputs |
|---|---|---|
put_call_ratio |
aggregates put/call volume and open_interest by snapshot_date, then returns the latest row inside the active window |
pcr_volume, pcr_open_interest, pcr_status |
options_analysis |
selects one ATM-like contract per day using ABS(moneyness_pct) then lower spread_pct, builds daily ATM IV series, computes ROUND(100 * PERCENT_RANK() OVER (ORDER BY atm_iv), 2) over the lookback window, returns the latest observation; also computes put/call IV skew and OTM wing IVs |
latest_atm_iv, latest_atm_iv_rank_pct, latest_iv_skew, latest_otm_put_iv, latest_otm_call_iv, iv_rank_lookback_days, data_freshness |
liquidity_analysis |
latest-snapshot aggregate of SUM(volume), SUM(open_interest), AVG(spread_pct), and liquid contract count |
<ticker>_daily_option_volume, <ticker>_open_interest, <ticker>_avg_spread_pct, <ticker>_liquid_contracts, <ticker>_market_impact_risk |
pricing_spread |
latest liquid DTE-bounded quote snapshot around the active option surface | <ticker>_underlying_price, <ticker>_avg_bid, <ticker>_avg_ask, <ticker>_avg_spread_pct, <ticker>_avg_last_price |
macro_analysis |
latest macro-series row after optional ETF-to-index alias translation | <ticker>_last_value, <ticker>_mom_change or <ticker>_daily_change |
geopolitical_analysis |
latest monthly GPR row with percentile and MoM trend classification | gpr_index_level, gpr_percentile, gpr_trend |
These unit contracts are critical and should not be reinterpreted downstream:
| Field | Stored Unit | Retrieval Rule |
|---|---|---|
spread_pct |
percentage points, for example 2.5 means 2.5% |
never multiply by 100 again |
daily_change_pct |
percentage points | render as-is |
mom_change_pct |
percentage points | render as-is |
gpr_percentile |
percentage points, for example 97.58 means 97.58% |
render as a display percentage string only |
latest_atm_iv_rank_pct |
percentile score on a 0-100 scale |
already multiplied by 100 inside SQL via PERCENT_RANK() |
Additional notes:
latest_atm_iv_rank_pctisNULLwhen fewer than two daily ATM IV observations exist in the ranking window.moneyness_pctis used only for ATM contract selection priority; it is not returned as a final display field here.
- Metric resolution order:
- exact dispatcher key
- normalized exact match
- ontology-backed fuzzy match through
difflib
- Unsupported metrics are surfaced through:
status.unsupported_metricsstatus.unsupported_metrics_message
- Tickers remain strict:
.upper().strip()only- capped by
SILVER_MAX_TICKERS - overflow is logged through
TICKER_CAP
- Anchor source:
config/runtime/collect_data_state.json - Dataset-specific keys:
options->options_dailymacro->macro_trading_dailygpr->gpr_monthly
- Fallback:
date.today()when runtime state is unavailable
When MasterRetriever supplies compiled TimePredicate objects:
silver.optionsuses the predicate'sstart_date,end_date, andwindow_daysdirectly,silver.gprinherits monthly widening semantics fromtime_adapter.py,- handlers do not re-derive time widening locally.
When predicates are absent:
- local fallback behavior uses
schema.TIME_WINDOW_DAYS, put_call_ratiohonors explicit short windows such astoday,yesterday, andpast_week,- the historical 5-day floor is applied only when the user did not provide a valid time signal.
put_call_ratioWHERE symbol = ?snapshot_date BETWEEN start_date AND anchor- grouped by date, latest row only
options_analysisis_liquid = truedte BETWEEN 7 AND 45- IV-rank lookback bounded by
IV_RANK_LOOKBACK_DAYS
liquidity_analysis- latest available
snapshot_date - no strict date
WHEREclause
- latest available
pricing_spreadis_liquid = truedte BETWEEN 7 AND 45
macro_analysis- optional ETF-to-index aliasing, for example
SPY -> ^GSPC - latest row per symbol
- optional ETF-to-index aliasing, for example
geopolitical_analysis- latest monthly row from GPR parquet
On initialization, _validate_parquet_contract() verifies:
- parquet globs resolve to files,
- required columns exist,
- parquet metadata is readable.
Statuses surfaced in the audit log:
OKNO_FILESMISSING_COLUMNSREAD_FAILED
| Field | Type | Description | Produced By | Path |
|---|---|---|---|---|
values |
Dict[str, Any] |
final numeric and string outputs consumed by downstream agents | handler aggregation | Scripts/retrieval/sql_tools.py |
lineage_anchors |
List[str] |
deterministic anchor IDs such as PCR_AGG_*, IVRANK_*, LIQ_*, PX_*, MACRO_*, GPR_* |
handler return payloads | Scripts/retrieval/sql_tools.py |
citation_contract |
Dict[str, Any] |
per-metric structured object with preferred_anchor, audit_lineage_anchors, legacy_aliases, observed_at, and source_channel; built by _build_citation_contract() |
_build_citation_contract() |
Scripts/retrieval/sql_tools.py |
citation_anchor_map |
Dict[str, str] |
flat {metric_key: preferred_anchor} compatibility shim derived from citation_contract |
query_parquet_by_metadata() |
Scripts/retrieval/sql_tools.py |
status.unsupported_metrics |
List[str] |
requested metrics that failed canonical mapping | _partition_requested_metrics() |
Scripts/retrieval/sql_tools.py |
status.unsupported_metrics_message |
str |
user-facing explanation with supported examples | _build_unsupported_metric_message() |
Scripts/retrieval/sql_tools.py |
status.error |
str | None |
explicit fatal state such as NO_SUPPORTED_METRICS |
query_parquet_by_metadata() |
Scripts/retrieval/sql_tools.py |
Each handler emits deterministic anchor IDs embedded in lineage_anchors and indexed in citation_anchor_map. The Checker uses these anchors to verify numeric claims against silver_context_frozen.
| Handler | Anchor pattern |
|---|---|
| Put/Call Ratio | PCR_AGG_{TICKER}_{snapshot_date} |
| Options IV / Skew | IVRANK_{TICKER}_{date}, IVSKEW_{TICKER}_{date}, contract_symbol (OTM wings) |
| Liquidity | LIQ_{TICKER}_{snapshot_date} |
| Pricing / Spread | PX_{TICKER}_{snapshot_date} |
| Macro | MACRO_{TICKER}_{obs_date} or MACRO_{TICKER}_AS_{INDEX}_{obs_date} (ETF alias path) |
| GPR | GPR_{YYYYMM} |
| Artifact | Description | Path Pattern |
|---|---|---|
| SQL retrieval audit log | schema checks, SQL range mode, anchors, latency, execution errors | logs/Parquet_Query/<YYYY-MM-DD>/sql_retrieval_audit.log |
Important audit events written by this module:
SCHEMA_CHECKSQL_RANGESTART_QUERYFUZZY_MATCHUNSUPPORTED_METRICSTICKER_CAP— logged when the requested ticker count exceedsSILVER_MAX_TICKERS; the dropped tickers are recorded alongside the kept setEXECUTION_ERRORFINISH_QUERY
Recommended validation flow:
python -c "from Scripts.retrieval.sql_tools import SilverSQLTool; t=SilverSQLTool(); print('sql_tool_ok', bool(t.metric_dispatcher))"
python Scripts/tests/test_router_e2e.py
python -m Scripts query "Past week SPY put/call ratio and IV skew with liquidity risk"What to validate:
- unsupported metrics produce deterministic status rather than crashes,
- strict ticker handling never fuzzy-matches symbols,
- percentage-point fields are not multiplied twice,
put_call_ratioandoptions_analysisfollow the same explicit short-window semantics,lineage_anchorsare present for successful handler outputs,- SQL audit logs show the correct
SQL_RANGEmode and effective window.
Scripts/retrieval/sql_tools.pyScripts/retrieval/master_retriever.pyScripts/retrieval/time_adapter.pyScripts/retrieval/schema.pyScripts/core/financial_ontology.py