Commit 999aec6
committed
refactor: optimize parquet library read/write performance
- Batch all binary encode/decode functions to operate on arrays instead
of single values, reducing pack/unpack call overhead
- Replace BinaryBufferReader's DataSize objects and Bytes allocations
with raw int tracking and string returns, eliminating object creation on
hot paths
- Skip Dremel shredding/assembly for flat non-nested columns, bypassing
unnecessary FlatValue object creation and definition level conversion
- Replace array_merge with array_push in ColumnChunkBuilders and batch
statistics tracking to avoid O(n²) array copying
- Memoize maxDefinitionsLevel/maxRepetitionsLevel on schema columns and
cache flatPath in local variables to cut repeated method calls
refactor: parquet writer from row-by-row to columnar shredding
refactor: replace serialize with pack on primivite values
refactor: optimize flat columns writing
refactor: make dremel shredder to precache shredding plans1 parent a917ec6 commit 999aec6
69 files changed
Lines changed: 2903 additions & 2722 deletions
File tree
- src/lib/parquet
- src/Flow/Parquet
- BinaryReader
- Binary
- Data
- Dremel
- ColumnData
- ParquetFile
- Data
- Converter
- Page/Header
- RowGroup
- Schema
- Reader
- Writer
- ColumnChunkBuilder
- PageBuilder
- DictionaryBuilder
- tests/Flow/Parquet/Tests
- Integration
- Binary
- IO
- ParquetFile
- Writer
- ColumnChunkBuilder
- PageBuilder
- Unit
- BinaryReader
- Binary
- Data
- Dremel
- ColumnData
- Writer
- ColumnChunkBuilder
- PageBuilder
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| |||
This file was deleted.
0 commit comments