Skip to content

Regression in 25.8.16: Crash when reading Object(JSON) from Compact MergeTree parts #1412

@CarlosFelipeOR

Description

@CarlosFelipeOR

I checked the Altinity Stable Builds lifecycle table, and the Altinity Stable Build
version I'm using is still supported.

Type of problem

Bug report - something's broken

Describe the situation

A regression was introduced in PR #1407 which changes the default value of write_marks_for_substreams_in_compact_parts from true to false.
When this setting is false, ClickHouse server crashes with a SIGABRT when reading data from tables containing Array(Object('json')) with nested array structures inside the JSON objects.
The crash occurs during deserialization of tuple elements, where the reader expects per-substream marks that don't exist when the setting is disabled.

Error message:

Logical error: 'Unexpected size of tuple element 1: 0. Expected size: 1'.

This issue:

  • Is reproducible with debug builds (assertions enabled)
  • Affects tables with Array(Object('json')) containing nested arrays
  • Does not occur when write_marks_for_substreams_in_compact_parts=true

How to reproduce the behavior

Environment

  • Version: 25.8.16.10001.altinitytest (debug build)
  • Build type: Debug (required to trigger the assertion)

Option 1: Using the debug binary

Download the debug binary from CI artifacts:

wget https://altinity-build-artifacts.s3.amazonaws.com/PRs/1407/98b1107b14d7fe362c5619374621bcc6efde9477/build_amd_debug/clickhouse
chmod +x clickhouse
mv clickhouse clickhouse-debug

./clickhouse-debug server

Option 2: Using stateless test

Run the existing test 01825_type_json_in_array which covers this scenario.


Manual reproduction steps

Connect to the server and execute:

SET allow_experimental_object_type = 1;
DROP TABLE IF EXISTS t_json_complex;
CREATE TABLE t_json_complex (id UInt32, arr Array(Object('json')))
ENGINE = MergeTree ORDER BY id;

-- Insert data with nested arrays inside JSON objects
INSERT INTO t_json_complex FORMAT JSONEachRow {"id": 1, "arr": [{"k1": [{"k2": "aaa", "k3": "bbb"}, {"k2": "ccc"}]}]}
INSERT INTO t_json_complex FORMAT JSONEachRow {"id": 2, "arr": [{"k1": [{"k3": "ddd", "k4": 10}, {"k4": 20}], "k5": {"k6": "foo"}}]}

-- This query crashes the server
SELECT id, arr.k1.k2, arr.k1.k3, arr.k1.k4, arr.k5.k6 FROM t_json_complex ORDER BY id;

Expected behavior

The SELECT query should return the nested JSON data correctly:

  ┌─id─┬─arr.k1.k2─────────┬─arr.k1.k3─────────┬─arr.k1.k4─┬─arr.k5.k6─┐
  │  1 │ [['aaa','ccc']]   │ [['bbb','']]      │ [[0,0]]   │ ['']      │
  │  2 │ [['','']]         │ [['ddd','']]      │ [[10,20]] │ ['foo']   │
  └────┴───────────────────┴───────────────────┴───────────┴───────────┘

Actual behavior

The server crashes with SIGABRT:

2026.02.17 02:25:21.414533 [ 1169016 ] {37b7e5de-6cdf-4b14-a514-2d567821155b} <Fatal> : Logical error: 'Unexpected size of tuple element 1: 0. Expected size: 1'.
2026.02.17 02:25:21.449885 [ 1169016 ] {37b7e5de-6cdf-4b14-a514-2d567821155b} <Fatal> : Stack trace (when copying this message, always include the lines below):

0. /home/ubuntu/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__exception/exception.h:113: Poco::Exception::Exception(String const&, int) @ 0x000000002755deb2
1. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Common/Exception.cpp:128: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000145ea2e9
2. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Common/Exception.h:123: DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000d24e18e
3. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Common/Exception.h:58: DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000d24db91
4. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Common/Exception.h:141: DB::Exception::Exception<unsigned long&, unsigned long, unsigned long&>(int, FormatStringHelperImpl<std::type_identity<unsigned long&>::type, std::type_identity<unsigned long>::type, std::type_identity<unsigned long&>::type>, unsigned long&, unsigned long&&, unsigned long&) @ 0x000000001a6a9336
5. /home/ubuntu/_work/ClickHouse/ClickHouse/src/DataTypes/Serializations/SerializationTuple.cpp:810: DB::SerializationTuple::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::unordered_map<String, std::unique_ptr<DB::ISerialization::ISubstreamsCacheElement, std::default_delete<DB::ISerialization::ISubstreamsCacheElement>>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, std::unique_ptr<DB::ISerialization::ISubstreamsCacheElement, std::default_delete<DB::ISerialization::ISubstreamsCacheElement>>>>>*) const @ 0x000000001a6d50d6
6. /home/ubuntu/_work/ClickHouse/ClickHouse/src/DataTypes/Serializations/SerializationArray.cpp:492: DB::SerializationArray::deserializeBinaryBulkWithMultipleStreams(COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, unsigned long, DB::ISerialization::DeserializeBinaryBulkSettings&, std::shared_ptr<DB::ISerialization::DeserializeBinaryBulkState>&, std::unordered_map<String, std::unique_ptr<DB::ISerialization::ISubstreamsCacheElement, std::default_delete<DB::ISerialization::ISubstreamsCacheElement>>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, std::unique_ptr<DB::ISerialization::ISubstreamsCacheElement, std::default_delete<DB::ISerialization::ISubstreamsCacheElement>>>>>*) const @ 0x000000001a5df852
7. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeReaderCompact.cpp:248: DB::MergeTreeReaderCompact::readData(unsigned long, COW<DB::IColumn>::immutable_ptr<DB::IColumn>&, unsigned long, unsigned long, unsigned long, unsigned long, DB::MergeTreeReaderStream&, std::unordered_map<String, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>>&, std::unordered_map<String, COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>>*, std::unordered_map<String, std::unique_ptr<DB::ISerialization::ISubstreamsCacheElement, std::default_delete<DB::ISerialization::ISubstreamsCacheElement>>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, std::unique_ptr<DB::ISerialization::ISubstreamsCacheElement, std::default_delete<DB::ISerialization::ISubstreamsCacheElement>>>>>*) @ 0x000000001eed960d
8. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeReaderCompactSingleBuffer.cpp:70: DB::MergeTreeReaderCompactSingleBuffer::readRows(unsigned long, unsigned long, bool, unsigned long, unsigned long, std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&) @ 0x000000001eedee93
9. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeRangeReader.cpp:127: DB::MergeTreeRangeReader::DelayedStream::finalize(std::vector<COW<DB::IColumn>::immutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::immutable_ptr<DB::IColumn>>>&) @ 0x000000001eeca184
10. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeRangeReader.cpp:309: DB::MergeTreeRangeReader::startReadingChain(unsigned long, DB::MarkRanges&) @ 0x000000001eed1ec3
11. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeReadersChain.cpp:68: DB::MergeTreeReadersChain::read(unsigned long, DB::MarkRanges&, std::vector<DB::MarkRanges, std::allocator<DB::MarkRanges>>&) @ 0x000000001eef8715
12. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeReadTask.cpp:229: DB::MergeTreeReadTask::read() @ 0x000000001eef60b3
13. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeSelectAlgorithms.h:53: DB::MergeTreeInOrderSelectAlgorithm::readFromTask(DB::MergeTreeReadTask&) @ 0x000000001fab9b0c
14. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeSelectProcessor.cpp:234: DB::MergeTreeSelectProcessor::read() @ 0x000000001ef08112
15. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Storages/MergeTree/MergeTreeSource.cpp:229: DB::MergeTreeSource::tryGenerate() @ 0x000000001faa51a9
16. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Processors/ISource.cpp:110: DB::ISource::work() @ 0x000000001f4dc742
17. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Processors/Executors/ExecutionThreadContext.cpp:53: DB::ExecutionThreadContext::executeTask() @ 0x000000001f4f9950
18. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Processors/Executors/PipelineExecutor.cpp:351: DB::PipelineExecutor::executeStepImpl(unsigned long, DB::IAcquiredSlot*, std::atomic<bool>*) @ 0x000000001f4ebbc5
19. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Processors/Executors/PipelineExecutor.cpp:279: DB::PipelineExecutor::executeSingleThread(unsigned long, DB::IAcquiredSlot*) @ 0x000000001f4ec129
20. /home/ubuntu/_work/ClickHouse/ClickHouse/src/Processors/Executors/PipelineExecutor.cpp:565: void std::__function::__policy_invoker<void ()>::__call_impl[abi:se190107]<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreads(std::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x000000001f4ed1c3
21. /home/ubuntu/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__functional/function.h:716: ? @ 0x0000000014738b53
22. /home/ubuntu/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__type_traits/invoke.h:117: ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'()::operator()() @ 0x000000001473f226
23. /home/ubuntu/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__functional/function.h:716: ? @ 0x0000000014735fe6
24. /home/ubuntu/_work/ClickHouse/ClickHouse/contrib/llvm-project/libcxx/include/__type_traits/invoke.h:117: void* std::__thread_proxy[abi:se190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000001473ca00
25. start_thread @ 0x000000000009caa4
26. clone3 @ 0x0000000000129c6c

2026.02.17 02:25:21.450091 [ 1167620 ] {} <Trace> BaseDaemon: Received signal 6
2026.02.17 02:25:21.450122 [ 1167620 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
2026.02.17 02:25:21.450129 [ 1167620 ] {} <Fatal> BaseDaemon: (version 25.8.16.10001.altinitytest, build id: 5E3C8A71D175863BBF837F2D50BFBE92DC49FDA3, git hash: 84102805cd7eacfd49b38f81ca46d868189c5b82, architecture: x86_64) (from thread 1169016) Received signal 6
2026.02.17 02:25:21.450130 [ 1167620 ] {} <Fatal> BaseDaemon: Signal description: Aborted


...
(query: SELECT id, arr.k1.k2, arr.k1.k3, arr.k1.k4, arr.k5.k6 FROM t_json_complex ORDER BY id;)
Received signal Aborted (6)

Root cause analysis

The crash originates in src/DataTypes/Serializations/SerializationTuple.cpp:810:

if (column_tuple.getColumn(i).size() != expected_size)
    throw Exception(... ErrorCodes::LOGICAL_ERROR,
        "Unexpected size of tuple element {}: {}. Expected size: {}",
        i, column_tuple.getColumn(i).size(), expected_size);

When write_marks_for_substreams_in_compact_parts=false:

  • Compact parts use .mrk3 format (column-level marks only)
  • Nested JSON arrays with tuples require per-substream marks (.mrk4 format)
  • Without per-substream marks, the deserializer cannot correctly position each substream
  • This causes size mismatches between tuple elements during reading

Workaround

Set write_marks_for_substreams_in_compact_parts=true at table creation:

CREATE TABLE t_json_complex (id UInt32, arr Array(Object('json')))
ENGINE = MergeTree ORDER BY id
SETTINGS write_marks_for_substreams_in_compact_parts = true;

Or alter existing tables:

ALTER TABLE t_json_complex MODIFY SETTING write_marks_for_substreams_in_compact_parts = true;

Note: Existing parts written with false will still be unreadable.


Additional context

Test cases by complexity

Case Structure Crashes?
Simple Object('json') with flat fields No
Intermediate Array(Object('json')) with flat fields No
Complex Array(Object('json')) with nested arrays inside Yes

Related PR

CI failures

Metadata

Metadata

Assignees

No one assigned

    Labels

    25.825.8 Altinity Stable25.8.1625.8.16 Stable25.8.16.1000125.8.16.10001 StablebugSomething isn't workingstable

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions