Skip to content

api: estimate counts by default, cache /blockchain burn totals#144

Open
NayiemW wants to merge 1 commit into
mainfrom
fix/api-large-node-perf
Open

api: estimate counts by default, cache /blockchain burn totals#144
NayiemW wants to merge 1 commit into
mainfrom
fix/api-large-node-perf

Conversation

@NayiemW

@NayiemW NayiemW commented May 15, 2026

Copy link
Copy Markdown

Two performance fixes for the public API that surface on long-running mainnet nodes. Both came out of debugging why api.solar.org was returning empty lists and 500s under explorer poll load.

1. Default estimateTotalCount to true

packages/api/src/defaults.ts had:

estimateTotalCount: !!process.env.CORE_API_ESTIMATED_TOTAL_COUNT,

With the env var unset (the case on most installs) this evaluates to false, which means AbstractRepository.listByExpression skips the EXPLAIN-based estimate and runs a real COUNT(*) inside a REPEATABLE READ transaction on the api connection. The api connection has statement_timeout = apiConnectionTimeout (3000ms by default). SELECT COUNT(*) FROM blocks on a 16M-row mainnet table takes ~30s — it always times out, hits the outer catch in listByExpression, and returns:

{ results: [], totalCount: 0, meta: { totalCountIsEstimate: false } }

The controller wraps that into a 200 response with totalCount: 0 and an empty data array. Every list endpoint (/blocks, /transactions, /blocks/missed, etc.) silently returns no data, while /blocks/:id keeps working because it uses findManyByExpression which doesn't go through this path. Single-row lookups OK, lists broken.

This patch flips the default to true and turns the env var into an explicit opt-out:

estimateTotalCount: process.env.CORE_API_ESTIMATED_TOTAL_COUNT !== "false",

totalCountIsEstimate is already exposed in the response meta, so clients that care can distinguish. Anyone who needs the exact count on a small node or in tests can still set CORE_API_ESTIMATED_TOTAL_COUNT=false.

2. Cache /blockchain burn totals

BlockchainController.index recomputes the burn totals on every request:

const fees = Utils.BigNumber.make(await this.transactionRepository.getFeesBurned());
const transactions = Utils.BigNumber.make(await this.transactionRepository.getBurnTransactionTotal());

getFeesBurned is SELECT COALESCE(SUM(burned_fee), 0) FROM transactions — an unbounded SUM. On a 13M-row table that's a 25-40s parallel seq scan. Postgres won't switch to index-only scan on the existing transactions_burned_fee index until the visibility map is current, which doesn't happen between autovacuums.

Under explorer poll load (every dashboard hits /blockchain for the "API is healthy" indicator, supply, and burn stats) these queries pile up holding api connections. Load average reached 8+ when bumping the timeout to let them complete, because each request kicked off another 40s scan while previous ones were still running. Keeping the 3s timeout makes them 500 instead of pile up, but the explorer still shows "Unable to reach Solar API".

The two SUM results only change when a new block with burned fees or a burn transaction is forged. Cache them in-process with a 10-minute TTL and a single-flight refresh guard so concurrent requests can't trigger multiple background refreshes. Foreground requests always return immediately from the cache; the refresh runs at most once every TTL window regardless of load. On a cold cache (process start) the first response is zeros for ~30s while the first refresh runs — acceptable for an endpoint where the alternative is 500ing forever.

height, id, and supply are still computed fresh on each request — those are cheap.

Testing

Verified on api.solar.org (Solar Core 4.3.1):

  • Before: /blocks?limit=1{ totalCount: 0, data: [] }. /blockchain → 500 after 3s.
  • After: /blocks?limit=1totalCount: 16,266,352 with real data in ~60ms. /blockchain → 200 in ~25ms, returning cached burn totals (refreshed in the background).

estimateTotalCount in defaults previously evaluated to false when
CORE_API_ESTIMATED_TOTAL_COUNT was unset, which routed every list
endpoint through a real COUNT(*) inside a REPEATABLE READ transaction
on the api connection. statement_timeout (3s) kills COUNT(*) on a
mainnet-sized blocks or transactions table; the catch returns
totalCount: 0 with an empty data array under a 200, so clients see
silently empty results. Flip the default to true and treat the env
var as an explicit opt-out (=false).

BlockchainController.index calls two unbounded SUMs over transactions
per request, each ~30s on mainnet. Under explorer polling these pile
up holding api connections (load average 8+ observed under load).
Cache the result in-process with a 10-minute TTL and a single-flight
background refresh — values only change on blocks carrying burned
fees or burn transactions, and the endpoint's main consumers are
dashboards where that staleness is fine.
@NayiemW NayiemW requested a review from alessiodf as a code owner May 15, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant