Skip to content

yuno-research/solana_car

Repository files navigation

Solana Old Faithful Content Addressable Archive (CAR) Defi Activity Parser

High-performance Rust binary application for parsing Solana Content Addressable Archive (CAR) files from Yellowstone Old Faithful and extracting swap transactions and token creations into Parquet format.

Overview

This application uses parallel processing and optimized disk I/O to scan massive ~700GB CAR archive files of Solana epochs (432k blocks, 2 days), decode transactions using the solana_tx_decoding library, and write standardized output (SwapTx and TokenCreation structs) to Parquet files using Polars.

Architecture

  • Primary, Nonblocked IO Thread: Reads CAR files from disk sequentially using syscalls, processing does not block disk I/O
  • Rayon Thread Pool: Processes blocks in parallel on worker threads (configurable, default 48 threads)
  • Tokio Broadcast Channels: Async channels send decoded data to parquet writers
  • Async Parquet Writers: Batch writing to Parquet files using Polars to keep memory usage constant and efficient

Processing Pipeline

  1. Read CAR file and parse nodes using Content-Addressable Archive format
  2. Accumulate nodes until a complete block is found
  3. Send block chunks to worker thread pool for parallel processing
  4. On worker threads:
    • Parse raw block bytes into transaction nodes
    • Deserialize transaction data
    • Reassemble and decompress transaction metadata
    • Use solana_tx_decoding to extract swaps and token creations
  5. Send decoded results via broadcast channels to async parquet writers
  6. Batch write results to Parquet files with compression

Output

  • Swap Transactions: Written to /mnt/car/swap_txs/swaps_epoch_{epoch}_part_{part}.parquet
    • Multiple parts per epoch if swap volume exceeds batch size (500k swaps per part)
    • Snappy compression for efficient storage
  • Token Creations: Written to /mnt/car/token_creations/tokens_epoch_{epoch}.parquet
    • Includes metadata fetched from token URIs (description, twitter, website)
    • Single file per epoch

Building

Build the release binary:

cargo build --release

The binary will be located at target/release/solana_car.

Usage

Basic Usage

Process a single CAR file for a specific epoch:

./target/release/solana_car --epoch 860

The application expects the CAR file to be located at /mnt/car/epoch-{epoch}.car.

Output Directories

Ensure the following directories exist before running:

mkdir -p /mnt/car/swap_txs
mkdir -p /mnt/car/token_creations

Downloading and Parsing Multiple Epochs

The DlParse.sh script automates downloading CAR files from the Old Faithful archive and parsing them:

The script:

  1. Downloads CAR files from https://files.old-faithful.net/{epoch}/epoch-{epoch}.car using aria2c for increased download speed
  2. Parses each epoch using the compiled binary
  3. Copies parquet files to permanent storage (~/swap_txs/)
  4. Cleans up raw CAR files and copied parquet files to free disk space

Note: The script processes epochs in reverse order. Modify the epoch range in the script as needed.

Configuration

The following constants in main.rs can be adjusted:

  • NUM_THREADS: Number of worker threads in the thread pool (default: 48)
  • BLOCKS_PER_THREAD: Number of blocks to batch per thread (default: 2000)
  • BATCH_SIZE (in collect_swaps_to_parquet.rs): Number of swaps per parquet file part (default: 500000)
    • An increased batch size will lead to increased memory pressure as swaps accumulate in memory before being written to disk

About Old Faithful

This repository processes CAR files from the Yellowstone Old Faithful project, which provides Content Addressable Archives (CARs) of Solana's blockchain history. These archives represent verifiable, immutable snapshots of entire epochs.

Old Faithful CAR files:

  • Are content-addressable for trustless retrieval
  • Contain full epoch snapshots from Solana warehouse nodes
  • Use CAR format (IPLD standard) optimized for Solana
  • Enable rapid historical data ingestion and storage

For more information about the Old Faithful archival project, visit: https://docs.old-faithful.net

Attribution

This repository is a a fork of the lamports-dev/yellowstone-faithful-car-parser project from the Lamports Dev Solana Tools group.

Special thanks to the original developer: @fanatid, Kirill Fomichev, for creating the foundational CAR parsing implementation.

Dependencies

  • solana_tx_decoding: Local dependency for transaction decoding
  • solana_central: Local dependency for shared types and utilities
  • polars: DataFrame library for Parquet writing
  • rayon: Parallel processing thread pool
  • tokio: Async runtime for broadcast channels and parquet writing

About

High-performance binary for parsing Solana Content Addressable Archive (CAR) files and extracting swap transactions and token creations into Parquet format for historical analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors