Skip to content

[Parquet] UB in TypedColumnWriterImpl::UpdateLevelHistogram due to std::span construction from nullptr #49928

@K-ballo

Description

@K-ballo

Describe the bug, including details regarding any error messages, version, and platform.

In Arrow 24.0.0, parquet::TypedColumnWriterImpl::UpdateLevelHistogram can construct a std::span from a null pointer, causing immediate UB and crashes in libstdc++:

auto add_levels = [](std::vector<int64_t>& level_histogram,
std::span<const int16_t> levels, int16_t max_level) {
if (max_level == 0) {
return;
}
ARROW_DCHECK_EQ(static_cast<size_t>(max_level) + 1, level_histogram.size());
::parquet::UpdateLevelHistogram(levels, level_histogram);
};
add_levels(page_size_statistics_->definition_level_histogram,
{def_levels, static_cast<size_t>(num_levels)},
descr_->max_definition_level());
add_levels(page_size_statistics_->repetition_level_histogram,
{rep_levels, static_cast<size_t>(num_levels)},
descr_->max_repetition_level());
}

Seen with rep_levels == nullptr, num_levels == 3, and max_level == 0.
Note the max_level == 0 guard inside the lambda is too late to protect from UB while constructing the std::span argument.

Component(s)

Parquet

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions