Add RPC ingestion load test driven by synthetic apply-load ledger bundles#741
Add RPC ingestion load test driven by synthetic apply-load ledger bundles#741cjonas9 wants to merge 71 commits into
Conversation
|
⏳ Load test launching on |
| @@ -0,0 +1,344 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
we require go to be installed to run rpc and the ingestion load test. so I wonder if most of the logic in this bash script could live in a go file. I think that it would be easier to understand and maintain a go script than a large bash script.
There was a problem hiding this comment.
Yes, definitely. There's definitely some required shell, but cramming it all into one shell script is super excessive and messy (though I did like that it kept everything for the instance held in one place as user data). I'll see about refactoring some of this out
|
⏳ Load test launching on |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
|
❌ Ingest load test failed (run 27727364842 on |
|
⏳ Load test launching on |
|
❌ Ingest load test failed (run 27728512552 on |
|
⏳ Load test launching on |
📈 Ingest load test —
|
| Profile | Ledgers | Wall-clock | Ledgers/sec | ms/ledger | p50 / p95 / p99 ms |
|---|---|---|---|---|---|
| apply-load-v27-oz | 1000 | 1250.049s | 0.80 | 1251.30 | 1174.999 / 1749.997 / 2025 |
| apply-load-v27-sac | 1000 | 1156.650s | 0.86 | 1156.65 | 1174.997 / 1250 / 1299.999 |
| apply-load-v27-soroswap | 1000 | 827.400s | 1.21 | 827.40 | 825.001 / 900.001 / 974.999 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Overall throughput | 0.93 ledgers/sec |
| Overall ingest wall-clock | 3234.099s |
| Per-ledger p50 / p95 / p99 | 1100 / 1449.999 / 1900 ms |
| Golden DB fetch+decompress | 1180s |
| stellar-core | v27.0.0 |
| Workflow run | #27732589868 |
|
⏳ Load test launching on |
📈 Ingest load test —
|
| Profile | Ledgers | Wall-clock | Ledgers/sec | ms/ledger | p50 / p95 / p99 ms |
|---|---|---|---|---|---|
| apply-load-v27-oz | 1000 | 1240.774s | 0.81 | 1242.02 | 1150.001 / 1725 / 2050 |
| apply-load-v27-sac | 1000 | 1139.350s | 0.88 | 1139.35 | 1149.998 / 1225.001 / 1275.001 |
| apply-load-v27-soroswap | 1000 | 825.875s | 1.21 | 825.87 | 825.001 / 900.001 / 950 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Overall throughput | 0.94 ledgers/sec |
| Overall ingest wall-clock | 3205.999s |
| Per-ledger p50 / p95 / p99 | 1099.998 / 1449.999 / 1900 ms |
| Golden DB fetch+decompress | 1451s |
| stellar-core | v27.0.0 |
| Workflow run | #27769496808 |
|
⏳ Load test launching on |
|
⏳ Load test launching on |
📈 Ingest load test —
|
| Profile | Ledgers | Wall-clock | Ledgers/sec | ms/ledger | p50 / p95 / p99 ms |
|---|---|---|---|---|---|
| apply-load-v27-oz | 1000 | 1234.525s | 0.81 | 1235.76 | 1150 / 1700.001 / 1925 |
| apply-load-v27-sac | 1000 | 1138.699s | 0.88 | 1138.70 | 1149.999 / 1225 / 1275 |
| apply-load-v27-soroswap | 1000 | 829.349s | 1.21 | 829.35 | 825.002 / 900.001 / 974.999 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Overall throughput | 0.94 ledgers/sec |
| Overall ingest wall-clock | 3202.573s |
| Per-ledger p50 / p95 / p99 | 1099.998 / 1450.001 / 1824.999 ms |
| Golden DB fetch+decompress | 2440s |
| stellar-core | v27.0.0 |
| Workflow run | #27782520036 |
📈 Ingest load test —
|
| Profile | Ledgers | Wall-clock | Ledgers/sec | ms/ledger | p50 / p95 / p99 ms |
|---|---|---|---|---|---|
| apply-load-v27-oz | 1000 | 1234.300s | 0.81 | 1235.54 | 1150 / 1674.999 / 1925 |
| apply-load-v27-sac | 1000 | 1137.950s | 0.88 | 1137.95 | 1149.999 / 1225 / 1275 |
| apply-load-v27-soroswap | 1000 | 829.175s | 1.21 | 829.17 | 849.998 / 900.001 / 975 |
| Metric | Value |
|---|---|
| Ledgers replayed | 3000 |
| Initial DB ledger count | 120960 |
| Overall throughput | 0.94 ledgers/sec |
| Overall ingest wall-clock | 3201.424s |
| Per-ledger p50 / p95 / p99 | 1099.998 / 1450 / 1824.999 ms |
| Golden DB fetch+decompress | 2446s |
| stellar-core | v27.0.0 |
| Workflow run | #27789957778 |
What
This is a PR implementing a repeatable CI ingestion load test on a full database of 7 days of ledgers. The approximate design is here:

This GHA workflow for this test, currently, is triggered on pushes to this branch (
apply-load), but will later be modified to trigger on any release or on PR comments stating "run load test".The workflow benchmarks RPC ingestion end-to-end on an ephemeral c5.2xlarge: it launches the box, pulls a mainnet-scale golden DB (~307GB, 1-week retention window), a BUILD_TESTS stellar-core, and three apply-load ledger bundles from S3 (sha-verified). After the box downloads and decompresses this data, its gp3 volume is throttled to 125 MiB/s, ingests the bundles, and posts a per-profile results table to the run summary / PR.
Main Pieces:
integrationtest/ingest_loadtest_test.go::TestIngestSyntheticLedgers: byte-concatenates N bundles into one continuous stream (the backend rebases ledger seqs per ledger, so per-bundle seq resets are harmless), ingests onto the golden DB with retention trimming live, verifies exact classic/soroban op counts via parallelgetTransactionswalkers, and reports per-profile wall-clock/ledgers-sec/ms-ledger/latency quantiles.loadtest/testdata/apply-load-v27-*-cfg: config files specifying three O3 target tx profiles, 1,000 ledgers each: sac (1,000 soroban TPL), oz (900), soroswap (250). All generate these + 1,000 classic payments/ledger to create ledger bundles (for local usage or S3) offline bystellar-core apply-load..github/workflows/load-test.yml: push-triggered orchestrator. OIDC-assumes into AWS, launches an ephemeralc5.2xlarge(Ubuntu 22.04, 500GB gp3) with the runner script as user-data (shipped verbatim, TARGET_SHA/RUN_ID passed via a two-line env preamble), waits for SSM registration, delegates polling to the script, writes the results table to the step summary (and PR comment when one exists), fails the job on a fail verdict or timeout, and always terminates the instance.run-load-test.sh: both halves of the run in one self-contained script, coordinated by a /tmp marker protocol.orchestrate(on the GHA runner): polls the box over SSM, drives the gp3 downshift handshake (500 -> 125 MiB/s after downloads complete, so fetches are fast but the benchmark runs on throttled I/O), and relays verdict + results as step outputs.Why
CI testing of RPC ingestion performance; benchmarking. This also serves as an automated regression testing framework, though future work should expand this to report some metric that allows one to compare a run's results to historical results.
Known limitations
This is purely intended as a test of RPC's ingestion pipeline and seeks to see how it handles load in isolation (i.e. without captive core running). Future work should also seek to automatically refresh the S3 DB + ledger bundles on some pre-determined cadence.