Skip to content
This repository was archived by the owner on Mar 19, 2024. It is now read-only.

Ignore invalid utf8 characters while decoding#141

Open
devbugging wants to merge 1 commit into
masterfrom
gregor/utf8-decode
Open

Ignore invalid utf8 characters while decoding#141
devbugging wants to merge 1 commit into
masterfrom
gregor/utf8-decode

Conversation

@devbugging

Copy link
Copy Markdown

GCP streamer failed to decode block data that contained invalid utf8 characters. The explanation as to why that data was included is here (copy from Slack):

I’ve looked into this and it seems the issue is caused by the transaction with ID f6c8e65646a3b140902aa7559ae2e740bbe92fbef65f414a441c141340a5756f more specifically last argument of the transaction, if looked at closely you can see the whitespace before the address is not actual whitespace but invalid utf8 character, under further investigation I’ve found it’s BOM character https://en.wikipedia.org/wiki/Byte_order_mark
The problem is then in CBOR encoding/decoding assuming utf8 validity which in this case breaks. Because this is the first time (to my limited knowledge) the tx args are CBOR encoded/decoded and since tx args are provided by user input we can make the RN node fail with current setting. Making sure we set CBOR decoding flag to enable non valid utf-8 chars would fix this and at the same time understanding this issue I feel it would be an ok fix. So it’s not any bugs in the uploader producing malformed data but it’s an invalid input from the user.

This allows such characters and thus avoid failing.

Misc

  • PR title will be clear as part of the changelog
  • PR is against the correct branch
  • PR is labelled appropriately
  • PR is linked to an issue

@devbugging devbugging self-assigned this Aug 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants