Skip to content

Handling Long String Storage in Solana Programs (On-chain Size Limits & Off-chain Hashing Trade-offs) #54

@danielwangai

Description

@danielwangai

Hi,
First, thank you for the Solana boot camp resources. They are phenomenal.

I made an observation on project 4, crud-app. In the video, the journal entry has a title and a message as some of its fields with length constraints, i.e.

#[account]
pub struct JournalEntryState {
    pub owner: Pubkey,
    pub title: String, // max length of 50
    pub message: String,// max length of 1000
}

After implementing the CreateEntry account and its instruction create_journal_entry, it builds fine, with no errors.

I then wrote a test to check whether the program rejects a message longer than 1000 characters:

it("rejects message longer than 1000 characters", async () => {
      await airdrop(bob.publicKey);
      try {
        let title = "unique title";
        let longMessage = "x".repeat(1001);
        const [pda] = getJournalEntryAddress(
          title,
          alice.publicKey,
          program.programId,
        );
        await program.methods
          .createJournalEntry(title, longMessage)
          .accounts({
            journalEntry: pda,
            owner: bob.publicKey,
            systemProgram: anchor.web3.SystemProgram.programId,
          })
          .signers([bob])
          .rpc({ commitment: "confirmed" });
      } catch (error) {
        console.log("ERROR>>: ", error);
      }
    });

// get pda journal entry
const getJournalEntryAddress = (
    title: string,
    owner: PublicKey,
    programId: PublicKey,
  ) => {
    return anchor.web3.PublicKey.findProgramAddressSync(
      [anchor.utils.bytes.utf8.encode(title), owner.toBuffer()],
      programId,
    );
  };

I get this error:-

RangeError: encoding overruns Buffer

So I decided to reduce the length of the message in the test to 900 characters and get a different error:-

Transaction too large: 1260 > 1232

So the test fails when trying to store long strings. This got me curious about how to handle really large content on-chain.

One approach I tried was hashing the message and storing only the hash on-chain. The downside is that hashing is one-way — you can’t recover the original message from the hash. This also means that if you fetch the data directly from the blockchain, all you see is a hash value, which isn’t human-readable or useful on its own. To make it practical, I stored the hash on-chain while keeping the full content off-chain. This adds some extra complexity: you need to hash the message when writing, and you must maintain both the on-chain and off-chain versions in sync during updates or deletes. The trade-off is worth it though — with this approach, I was able to handle strings as long as 3000 characters.

cc @brimigs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions