Skip to content

Conversation

@ReallyNiceGuy
Copy link

According to:
FM94-BUFR Encoding and Decoding Software
User Guidelines
Version 1.6
For BUFR Software Version 3.1

The header for BUFR version 2 doesn't have second field. Extra bytes are allowed for local use

@ywangd
Copy link
Owner

ywangd commented Mar 25, 2025

Do you mind providing a sample file and add a test for it?

@ReallyNiceGuy
Copy link
Author

I was creating the test here and found the following weird thing:
If section 1 has an odd size, Encoding adds an '\x00' at the end and the section size becomes even, but the bufr_message.local_bytes.value is as expected.
When reading it back, the extra '\x00' is added to the bufr_message.local_bytes.value and then it doesn't match to the expected value.
This seems to happen only on version 2. I tried an odd size on version 4 and it passed the test without issues.

This passes:

from pybufrkit.decoder import Decoder
from pybufrkit.encoder import Encoder

data = [
    [
        b'BUFR',
        0,  # file length (will be calculated)
        2
    ],
    [0,  # section length (will be calculated)
     0,  # master table
     0,  # centre
     0,  # sequence number
     False,  # has section 2 (no)
     '0000000',  # flag bits
     6,  # data category
     0,  # data local subcategory
     11,  # master table version
     10,  # local table version
     25,  # year
     3,  # month
     25,  # day
     13,  # hour
     45,  # min
     b'test1',  # Extra bytes
    ],
    [0,  # section length (will be calculated)
     '00000000',  # reserved bits
     1,  # subsets
     True,  # is observation
     False,  # is compressed
     '000000',  # flag bits
     # Definition follows
     []
     ],
    [
        0,  # section length (will be calculated)
        '00000000',  # flag bits
        [
            [
            ]  # flat data
        ]
    ],
    [b'7777']
]


def test_optional_parameter_v2():
    encoder = Encoder()
    bufr_message = encoder.process(data)
    assert bufr_message.sections[1].section_length.value == 22
    assert bufr_message.local_bytes.value == b'test1'

    decoder = Decoder()
    decoded = decoder.process(bufr_message.serialized_bytes)
    assert decoded.sections[1].section_length.value == 22
    assert decoded.local_bytes.value == b'test1'

This fails:

from pybufrkit.decoder import Decoder
from pybufrkit.encoder import Encoder

data = [
    [
        b'BUFR',
        0,  # file length (will be calculated)
        2
    ],
    [0,  # section length (will be calculated)
     0,  # master table
     0,  # centre
     0,  # sequence number
     False,  # has section 2 (no)
     '0000000',  # flag bits
     6,  # data category
     0,  # data local subcategory
     11,  # master table version
     10,  # local table version
     25,  # year
     3,  # month
     25,  # day
     13,  # hour
     45,  # min
     b'test',  # Extra bytes
    ],
    [0,  # section length (will be calculated)
     '00000000',  # reserved bits
     1,  # subsets
     True,  # is observation
     False,  # is compressed
     '000000',  # flag bits
     # Definition follows
     []
     ],
    [
        0,  # section length (will be calculated)
        '00000000',  # flag bits
        [
            [
            ]  # flat data
        ]
    ],
    [b'7777']
]


def test_optional_parameter_v2():
    encoder = Encoder()
    bufr_message = encoder.process(data)
    assert bufr_message.sections[1].section_length.value == 21
    assert bufr_message.local_bytes.value == b'test'

    decoder = Decoder()
    decoded = decoder.process(bufr_message.serialized_bytes)
    assert decoded.sections[1].section_length.value == 21
    assert decoded.local_bytes.value == b'test'

@ywangd
Copy link
Owner

ywangd commented Mar 26, 2025

This is expected. See this code comment

# For edition 3 and earlier, ensure each section has an even number of octets.
# This is done by padding Zeros to the required number of octets.

In the same doc that you were referring to, it says the following in 2.2

Each of the sections of a BUFR message is made up of a series of octets. The term octet,
meaning 8 bits, was coined to qualify one byte as an 8-bit sequence. An individual
section shall always consist of an even number of octets, with extra bits added on and
set to zero when necessary.

@ReallyNiceGuy
Copy link
Author

Thank you! I can almost read a technical document ;)

I will finish the test cases and update the commit.

According to:
FM94-BUFR Encoding and Decoding Software
User Guidelines
Version 1.6
For BUFR Software Version 3.1

The header for BUFR version 2 doesn't have second field. Extra bytes are allowed for local use
@ywangd
Copy link
Owner

ywangd commented Mar 27, 2025

Thanks for the update

@ywangd ywangd merged commit 2087ef8 into ywangd:master Mar 27, 2025
5 checks passed
@ReallyNiceGuy ReallyNiceGuy deleted the fix_bufr_v2 branch May 22, 2025 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants