High-performance Rust binary application for parsing Solana Content Addressable Archive (CAR) files from Yellowstone Old Faithful and extracting swap transactions and token creations into Parquet format.
This application uses parallel processing and optimized disk I/O to scan massive ~700GB CAR archive files of Solana epochs (432k blocks, 2 days), decode transactions using the solana_tx_decoding library, and write standardized output (SwapTx and TokenCreation structs) to Parquet files using Polars.
- Primary, Nonblocked IO Thread: Reads CAR files from disk sequentially using syscalls, processing does not block disk I/O
- Rayon Thread Pool: Processes blocks in parallel on worker threads (configurable, default 48 threads)
- Tokio Broadcast Channels: Async channels send decoded data to parquet writers
- Async Parquet Writers: Batch writing to Parquet files using Polars to keep memory usage constant and efficient
- Read CAR file and parse nodes using Content-Addressable Archive format
- Accumulate nodes until a complete block is found
- Send block chunks to worker thread pool for parallel processing
- On worker threads:
- Parse raw block bytes into transaction nodes
- Deserialize transaction data
- Reassemble and decompress transaction metadata
- Use
solana_tx_decodingto extract swaps and token creations
- Send decoded results via broadcast channels to async parquet writers
- Batch write results to Parquet files with compression
- Swap Transactions: Written to
/mnt/car/swap_txs/swaps_epoch_{epoch}_part_{part}.parquet- Multiple parts per epoch if swap volume exceeds batch size (500k swaps per part)
- Snappy compression for efficient storage
- Token Creations: Written to
/mnt/car/token_creations/tokens_epoch_{epoch}.parquet- Includes metadata fetched from token URIs (description, twitter, website)
- Single file per epoch
Build the release binary:
cargo build --releaseThe binary will be located at target/release/solana_car.
Process a single CAR file for a specific epoch:
./target/release/solana_car --epoch 860The application expects the CAR file to be located at /mnt/car/epoch-{epoch}.car.
Ensure the following directories exist before running:
mkdir -p /mnt/car/swap_txs
mkdir -p /mnt/car/token_creationsThe DlParse.sh script automates downloading CAR files from the Old Faithful archive and parsing them:
The script:
- Downloads CAR files from
https://files.old-faithful.net/{epoch}/epoch-{epoch}.carusingaria2cfor increased download speed - Parses each epoch using the compiled binary
- Copies parquet files to permanent storage (
~/swap_txs/) - Cleans up raw CAR files and copied parquet files to free disk space
Note: The script processes epochs in reverse order. Modify the epoch range in the script as needed.
The following constants in main.rs can be adjusted:
NUM_THREADS: Number of worker threads in the thread pool (default: 48)BLOCKS_PER_THREAD: Number of blocks to batch per thread (default: 2000)BATCH_SIZE(incollect_swaps_to_parquet.rs): Number of swaps per parquet file part (default: 500000)- An increased batch size will lead to increased memory pressure as swaps accumulate in memory before being written to disk
This repository processes CAR files from the Yellowstone Old Faithful project, which provides Content Addressable Archives (CARs) of Solana's blockchain history. These archives represent verifiable, immutable snapshots of entire epochs.
Old Faithful CAR files:
- Are content-addressable for trustless retrieval
- Contain full epoch snapshots from Solana warehouse nodes
- Use CAR format (IPLD standard) optimized for Solana
- Enable rapid historical data ingestion and storage
For more information about the Old Faithful archival project, visit: https://docs.old-faithful.net
This repository is a a fork of the lamports-dev/yellowstone-faithful-car-parser project from the Lamports Dev Solana Tools group.
Special thanks to the original developer: @fanatid, Kirill Fomichev, for creating the foundational CAR parsing implementation.
solana_tx_decoding: Local dependency for transaction decodingsolana_central: Local dependency for shared types and utilitiespolars: DataFrame library for Parquet writingrayon: Parallel processing thread pooltokio: Async runtime for broadcast channels and parquet writing