Fix header for BUFR version 2 #43

ReallyNiceGuy · 2025-03-24T15:45:04Z

According to:
FM94-BUFR Encoding and Decoding Software
User Guidelines
Version 1.6
For BUFR Software Version 3.1

The header for BUFR version 2 doesn't have second field. Extra bytes are allowed for local use

ywangd · 2025-03-25T11:36:29Z

Do you mind providing a sample file and add a test for it?

ReallyNiceGuy · 2025-03-25T13:18:01Z

I was creating the test here and found the following weird thing:
If section 1 has an odd size, Encoding adds an '\x00' at the end and the section size becomes even, but the bufr_message.local_bytes.value is as expected.
When reading it back, the extra '\x00' is added to the bufr_message.local_bytes.value and then it doesn't match to the expected value.
This seems to happen only on version 2. I tried an odd size on version 4 and it passed the test without issues.

This passes:

from pybufrkit.decoder import Decoder
from pybufrkit.encoder import Encoder

data = [
    [
        b'BUFR',
        0,  # file length (will be calculated)
        2
    ],
    [0,  # section length (will be calculated)
     0,  # master table
     0,  # centre
     0,  # sequence number
     False,  # has section 2 (no)
     '0000000',  # flag bits
     6,  # data category
     0,  # data local subcategory
     11,  # master table version
     10,  # local table version
     25,  # year
     3,  # month
     25,  # day
     13,  # hour
     45,  # min
     b'test1',  # Extra bytes
    ],
    [0,  # section length (will be calculated)
     '00000000',  # reserved bits
     1,  # subsets
     True,  # is observation
     False,  # is compressed
     '000000',  # flag bits
     # Definition follows
     []
     ],
    [
        0,  # section length (will be calculated)
        '00000000',  # flag bits
        [
            [
            ]  # flat data
        ]
    ],
    [b'7777']
]


def test_optional_parameter_v2():
    encoder = Encoder()
    bufr_message = encoder.process(data)
    assert bufr_message.sections[1].section_length.value == 22
    assert bufr_message.local_bytes.value == b'test1'

    decoder = Decoder()
    decoded = decoder.process(bufr_message.serialized_bytes)
    assert decoded.sections[1].section_length.value == 22
    assert decoded.local_bytes.value == b'test1'

This fails:

from pybufrkit.decoder import Decoder
from pybufrkit.encoder import Encoder

data = [
    [
        b'BUFR',
        0,  # file length (will be calculated)
        2
    ],
    [0,  # section length (will be calculated)
     0,  # master table
     0,  # centre
     0,  # sequence number
     False,  # has section 2 (no)
     '0000000',  # flag bits
     6,  # data category
     0,  # data local subcategory
     11,  # master table version
     10,  # local table version
     25,  # year
     3,  # month
     25,  # day
     13,  # hour
     45,  # min
     b'test',  # Extra bytes
    ],
    [0,  # section length (will be calculated)
     '00000000',  # reserved bits
     1,  # subsets
     True,  # is observation
     False,  # is compressed
     '000000',  # flag bits
     # Definition follows
     []
     ],
    [
        0,  # section length (will be calculated)
        '00000000',  # flag bits
        [
            [
            ]  # flat data
        ]
    ],
    [b'7777']
]


def test_optional_parameter_v2():
    encoder = Encoder()
    bufr_message = encoder.process(data)
    assert bufr_message.sections[1].section_length.value == 21
    assert bufr_message.local_bytes.value == b'test'

    decoder = Decoder()
    decoded = decoder.process(bufr_message.serialized_bytes)
    assert decoded.sections[1].section_length.value == 21
    assert decoded.local_bytes.value == b'test'

ywangd · 2025-03-26T09:20:31Z

This is expected. See this code comment

pybufrkit/pybufrkit/encoder.py

Lines 169 to 170 in 5128e2e

    
           # For edition 3 and earlier, ensure each section has an even number of octets. 
        
           # This is done by padding Zeros to the required number of octets.

In the same doc that you were referring to, it says the following in 2.2

Each of the sections of a BUFR message is made up of a series of octets. The term octet,
meaning 8 bits, was coined to qualify one byte as an 8-bit sequence. An individual
section shall always consist of an even number of octets, with extra bits added on and
set to zero when necessary.

ReallyNiceGuy · 2025-03-26T09:31:19Z

Thank you! I can almost read a technical document ;)

I will finish the test cases and update the commit.

According to: FM94-BUFR Encoding and Decoding Software User Guidelines Version 1.6 For BUFR Software Version 3.1 The header for BUFR version 2 doesn't have second field. Extra bytes are allowed for local use

ywangd · 2025-03-27T11:21:55Z

Thanks for the update

ReallyNiceGuy force-pushed the fix_bufr_v2 branch from 021f0b8 to 67cb0e2 Compare March 24, 2025 15:47

Fix header for BUFR version 2

425d4a5

According to: FM94-BUFR Encoding and Decoding Software User Guidelines Version 1.6 For BUFR Software Version 3.1 The header for BUFR version 2 doesn't have second field. Extra bytes are allowed for local use

ReallyNiceGuy force-pushed the fix_bufr_v2 branch from 67cb0e2 to 425d4a5 Compare March 26, 2025 09:39

ywangd merged commit 2087ef8 into ywangd:master Mar 27, 2025
5 checks passed

ReallyNiceGuy deleted the fix_bufr_v2 branch May 22, 2025 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix header for BUFR version 2 #43

Fix header for BUFR version 2 #43

Uh oh!

ReallyNiceGuy commented Mar 24, 2025

Uh oh!

ywangd commented Mar 25, 2025

Uh oh!

ReallyNiceGuy commented Mar 25, 2025

Uh oh!

ywangd commented Mar 26, 2025

Uh oh!

ReallyNiceGuy commented Mar 26, 2025

Uh oh!

ywangd commented Mar 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix header for BUFR version 2 #43

Fix header for BUFR version 2 #43

Uh oh!

Conversation

ReallyNiceGuy commented Mar 24, 2025

Uh oh!

ywangd commented Mar 25, 2025

Uh oh!

ReallyNiceGuy commented Mar 25, 2025

Uh oh!

ywangd commented Mar 26, 2025

Uh oh!

ReallyNiceGuy commented Mar 26, 2025

Uh oh!

ywangd commented Mar 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants