Avoid dynamic allocation with `H5Sselect_hyperslab` #101

antonysigma · 2025-12-05T20:23:35Z

Request for comments: I have this nice patch in C++17 that avoids excessive std::vector construction and destruction when I do frequent dataset.select(...).read() and write from multiple threads. Happy to polish the code with you all to make it compile with C++14.

Implement HighFive::RegularHyperSlabNoMalloc<Rank> that takes hyperslab variables in stack space. Do not construct std::vector<hsize_t>.

Similarly, implement ::select(RegularHyperSlabNoMalloc<>) such that it is free of operator new[] calls before invoking H5Sselect_hyperslab.

Ensure that when hyperslab offset and range are compile-time constants, compiler (with arguments -O3 -flto) will inline everything without any operator new[] statements.

1uc · 2025-12-06T15:00:55Z

Thank you for sharing this. There's two common reasons you'd want this:

embedded (when dynamic allocations are banned),
performance.

If it's a performance issue and you have measurements/benchmarks that you're allowed to share, would you mind doing so? There's a different approach that might solve the issue for all users: pooled allocation. The advantage would be that everyone gets to benefit, and there's only one way of doing this.

Generally, could you explain a bit why you want this? I'm not at all against it, if it solves a (reasonably common) issue. However, since it duplicates existing functionality, it needs a little bit of consideration.

Implement `HighFive::RegularHyperSlabNoMalloc<Rank>` that takes hyperslab variables in stack space. Do not construct `std::vector<hsize_t>`. Similarly, implement `::select(RegularHyperSlabNoMalloc<>)` such that it is free of `operator new[]` calls before invoking `H5Sselect_hyperslab`. Ensure that when hyperslab offset and range are compile-time constants, compiler (with arguments `-O3 -flto`) will inline everything without any `operator new[]` statements.

antonysigma · 2025-12-08T18:30:19Z

Hi @1uc , thank you for reviewing the draft code. Yes, more than 50% of my projects involves (A) deploying the dynamic library libhdf5.so and my executable onto a soft-real time system having a weak CPU. Similar to the specification of the 2019-era Jetson Nano. So, there is a strict coding restrictions on dynamic allocations in the camera capture & File IO event loop.

And then, (B) around 5% of my project involves statically linking libhdf5.a and HighFive with mingw64 cross-compiler. The output EXE must be statically linked. And what's more, the objdump output of the compiled exe must prove the absence of the operator new[] and malloc in the performance critical frame read/write paths. Malloc calls from libhdf5.a is acceptable as long as it is outside the critical loop.

So it is less of a performance requirement, but more of solving the latency at the root, eliminating all dynamic allocations during IO.

Also acknowledged that pooled allocation (also called arena allocator) is available as a drop-in replacement, such as TCMalloc, and MiMalloc. It works well with (A), but failed miserably with (B) because of the lack of mingw supports.

P.S. Just curious... how much idea cross-pollination happens between HighFive and Steven-Verga-H5Cpp? I concur that my PR is more suitable there, as constexpr and consteval were used extensively by H5Cpp devs to ensure compile-time hardcoding of offset and count variables.

Too bad that H5Cpp developers stopped maintaining the project for over 4 years. That's why I favor HighFive over H5Cpp.

include/highfive/H5DataSpace.hpp

include/highfive/bits/H5Dataspace_misc.hpp

1uc · 2025-12-11T09:56:19Z

include/highfive/bits/H5Dataspace_misc.hpp

 template <typename... Args>
 inline DataSpace::DataSpace(size_t dim1, Args... dims)
-    : DataSpace(std::vector<size_t>{dim1, static_cast<size_t>(dims)...}) {}
+    : DataSpace(std::array{dim1, static_cast<size_t>(dims)...}) {}


Also easy to merge. (Technically, it uses C++17, but it's easy to fix.)

Thanks! Help wanted to refactoring the code for c++14.

Yes, no worries :) Same goes for all other requests for help. If I have permission to push commits to your branch, that's the easiest way for me to make the changes. I'll try to find time tomorrow. Nice feature, thank you.

include/highfive/bits/H5Slice_traits.hpp

1uc · 2025-12-11T09:58:53Z

include/highfive/bits/H5Slice_traits_misc.hpp

+    auto memspace = DataSpace(std::array<size_t, 1>{n_elements});
+
+    return detail::make_selection(memspace, filespace, details::get_dataset(slice));
+}


If RegularHyperSlabNoMalloc where user defined, I don't know how to solve this.

As you might have noticed I'm not so keen on having both a statically sized version and fixed-sized version. Especially, because it's currently not done for performance reasons and HighFive is very consistent about using std::vector for shapes.

Now for the good news: we could use SFINAE for select allowing users to pass in any type for which hyper_slab.apply(filespace) is valid (and maybe something else). Once, HighFive can accept anything "slab-like", there's no need to implement select specifically for RegularHyperSlabNoMalloc. For now RegularHyperSlabNoMalloc could live outside of HighFive (or in an example). That way even if you fork HighFive, rebasing your changes would be trivial, e.g. if you put RegularHyperSlabNoMalloc in a separate header, the chance of a merge conflict seems zero. Also, not adding it now doesn't mean we can't ever add it.

@antonysigma Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.

Re: moving RegularHyperSlabNoMalloc<Rank> to src/examples/ folder. For sure! Please check out the new changes.

Re: SFINAE for select(const T& hyper_slab). Sounds good to me. Please review the new changes following your advises. I suppose the system can tolerate the additional H5Sget_select_npoints calls for now. Once HighFive project migrates to c++17, we have more advanced tricks to compute npoints at compile-time.

Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.

Yes, help wanted to tighten the constraints eventually.

We'll first try with H5Sget_select_npoints if profiling/analysis shows it's bad, then computing the size of the hyperslab will become (optional if need be) part of the HyperSlab interface.

1uc

Do you not need HyperSlab, i.e. the combination of multiple regular hyperslabs?

1uc · 2025-12-11T10:03:05Z

@antonysigma sorry for the delay.

To the best of my knowledge there's very little connection to H5Cpp. HighFive was started by (I believe) a PhD student at BBP. Later the HPC group of BBP took over maintenance and expanded it considerable. Much later I joined and made it more robust and helped or added some advanced features.

It's a little surprising that you're able to use HighFive at all in those settings. HighFive doesn't play well with 32 architectures, because it uses size_t as if it were equivalent to hsize_t. (Accidentally, that works on 64-bit, but we've had credible reports that it breaks on 32-bit. The naive fix is too invasive; and we've never tried to solve it cleanly without breaking for existing users.)

The second reason it's surprising is because HighFive is quite liberal about using std::vector for shape. It's everywhere.

antonysigma

Hi @1uc ,

Thank you for the code review. I moved RegularHyperSlabNoMalloc<Rank> out of the include/ folder. Please read the inlined comments below.

Do you not need HyperSlab, i.e. the combination of multiple regular hyperslabs?

I am good so far. Almost all of the embedded soft real-time use cases do not require non-regular hyperslab. Actually, Python h5py community reported similar performance regression of union selection versus calling individual select(...).read(...) separately.

To the best of my knowledge there's very little connection to H5Cpp. HighFive was started by (I believe) a PhD student at BBP. Later the HPC group of BBP took over maintenance and expanded it considerable. Much later I joined and made it more robust and helped or added some advanced features.

Thanks. I also tracked down the presentation transcript. Yes, Highfive and h5cpp serves two different users. The former serves 5D (dimensions: XYZTC) microscopy datasets residing in heap space memory, and the latter serves high-frequency trading data (i.e. Tables with scalar values in table cells) residing in stack space memory.

Also adding @steven-varga here for context.

It's a little surprising that you're able to use HighFive at all in those settings. HighFive doesn't play well with 32 architectures, because it uses size_t as if it were equivalent to hsize_t. (Accidentally, that works on 64-bit, but we've had credible reports that it breaks on 32-bit. The naive fix is too invasive; and we've never tried to solve it cleanly without breaking for existing users.)

Nvidia Jetson Nano (maxwell chipset) has 64-bit ARM CPUs. I suppose you mean Xilinx Zynq7010. Ah, that explains a lot about the runtime errors that I experienced with Zynq7010 and Highfive back then. Thank you for the reminder.

The second reason it's surprising is because HighFive is quite liberal about using std::vector for shape. It's everywhere.

Fortunately, my projects does not require full MISRA/JSF compliance.
I only need to demonstrate that the critical IO path contains no operator new or malloc calls via the objdump tool.

Speaking of which, I was expecting C++20's constexpr std::vector<> to eliminate most of the transient memory allocations in HighFive at compile-time. So, it is perfectly fine for us to stay put, and let the ISO-C++ committee do the work for us.

AFAIK, the C++20 compliant compiler eliminates all memory allocations of std::vector at compile-time, if it can prove that:

all vector objects are constructed and destructed in the function,
function arguments do not contain pointers or std::vector::iterator, and
the function does not leak the pointers of the locally constructed vector in the return statement, or leak the pointer to anywhere else outside the scope.

antonysigma · 2025-12-11T18:17:18Z

include/highfive/bits/H5Dataspace_misc.hpp

 template <typename... Args>
 inline DataSpace::DataSpace(size_t dim1, Args... dims)
-    : DataSpace(std::vector<size_t>{dim1, static_cast<size_t>(dims)...}) {}
+    : DataSpace(std::array{dim1, static_cast<size_t>(dims)...}) {}


Thanks! Help wanted to refactoring the code for c++14.

antonysigma · 2025-12-11T18:27:01Z

include/highfive/bits/H5Slice_traits_misc.hpp

+    auto memspace = DataSpace(std::array<size_t, 1>{n_elements});
+
+    return detail::make_selection(memspace, filespace, details::get_dataset(slice));
+}


Re: moving RegularHyperSlabNoMalloc<Rank> to src/examples/ folder. For sure! Please check out the new changes.

Re: SFINAE for select(const T& hyper_slab). Sounds good to me. Please review the new changes following your advises. I suppose the system can tolerate the additional H5Sget_select_npoints calls for now. Once HighFive project migrates to c++17, we have more advanced tricks to compute npoints at compile-time.

Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.

Yes, help wanted to tighten the constraints eventually.

antonysigma · 2025-12-11T18:29:04Z

src/examples/select_partial_dataset_cpp17.cpp

+/*
+ *  Copyright (c), 2017, Adrien Devresse
+ *
+ *  Distributed under the Boost Software License, Version 1.0.
+ *    (See accompanying file LICENSE_1_0.txt or copy at
+ *          http://www.boost.org/LICENSE_1_0.txt)
+ *
+ */


I don't mind the author to be Adrien, or who ever can compose the CMake file for me. Help wanted here to compose the CMakeLists.txt.

I'd say it's either your name or "HighFive developers" and the year is 2025. Let me know which one you prefer.

For sure. My name "Antony Chan".

1uc · 2025-12-12T12:06:25Z

@antonysigma before I forget. Combining multiple hyperslabs requires care:

Combining N regular hyperslabs naively in a loop takes quadratic runtime. In hindsight it's obvious: A hyperslab is stored as (the N-dimensional equivalent of) a sorted linked list of non-overlapping regular hyperslabs. The issue is that finding the location to insert the next block takes O(N) time, because one need to traverse the list. Therefore, O(N**2) to insert N slabs iteratively. There's API that can combine to general hyperslabs in time O(n + m). This allows one to do divide and conquer. It's implemented in HighFive, see e9492c1 I reported it upstream: https://forum.hdfgroup.org/t/quadratic-runtime-of-selecting-n-hyperslabs/12555
Using Darshan we saw that HDF5 didn't combine reads. Therefore, if you select [1, 3, 5, 7, 9] it might be better to load [1, 2, 3, 4, 5, 6, 7, 8, 9] into a buffer and discard all even values. We did this in a different layer of BBP's I/O stack. It can be found here, it does the obvious (and only for a specific case) so I'm not sure it's worth reading: https://github.com/BlueBrain/libsonata/blame/master/src/read_canonical_selection.hpp

The changes are: - Make HyperSlabInterface an CRTP base class. - Make the example shorter, C++14 and rename it.

1uc · 2025-12-12T13:11:41Z

Right, ~~we can't get CI green~~ (edit: nvm, today's a good day and ubuntu-22.04 still works), because the coverage CI doesn't work on ubuntu-24.04. @antonysigma if the remaining CI goes green, would you like to test it with your application first before we merge?

codecov · 2025-12-12T13:17:52Z

Codecov Report

❌ Patch coverage is 30.30303% with 23 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/examples/select_partial_dataset_no_alloc.cpp	0.00%	23 Missing ⚠️

📢 Thoughts on this report? Let us know!

antonysigma · 2025-12-13T21:39:37Z

Right, ~~we can't get CI green~~ (edit: nvm, today's a good day and ubuntu-22.04 still works), because the coverage CI doesn't work on ubuntu-24.04. @antonysigma if the remaining CI goes green, would you like to test it with your application first before we merge?

@1uc It works for me. Thank you for revising the code.

antonysigma force-pushed the compile-time-regular-hyperslab branch 2 times, most recently from 1501238 to 62cd478 Compare December 7, 2025 04:52

antonysigma force-pushed the compile-time-regular-hyperslab branch from 62cd478 to 49151a0 Compare December 8, 2025 05:38

antonysigma changed the title ~~Avoid memory allocation with H5Sselect_hyperslab~~ Avoid dynamic allocation with H5Sselect_hyperslab Dec 8, 2025

antonysigma changed the title ~~Avoid dynamic allocation with H5Sselect_hyperslab~~ Avoid dynamic allocation with H5select_hyperslab Dec 8, 2025

antonysigma changed the title ~~Avoid dynamic allocation with H5select_hyperslab~~ Avoid dynamic allocation with H5Sselect_hyperslab Dec 8, 2025

1uc reviewed Dec 11, 2025

View reviewed changes

include/highfive/H5DataSpace.hpp Show resolved Hide resolved

1uc reviewed Dec 11, 2025

View reviewed changes

include/highfive/bits/H5Dataspace_misc.hpp Show resolved Hide resolved

1uc reviewed Dec 11, 2025

View reviewed changes

include/highfive/bits/H5Slice_traits.hpp Outdated Show resolved Hide resolved

1uc reviewed Dec 11, 2025

View reviewed changes

antonysigma added 2 commits December 11, 2025 09:25

Move RegularHyperSlabNoMalloc out of include/

dd8848d

Move std::optional to example folder

0879657

antonysigma commented Dec 11, 2025

View reviewed changes

antonysigma marked this pull request as ready for review December 11, 2025 19:16

fixup: review comments.

c00d39a

The changes are: - Make HyperSlabInterface an CRTP base class. - Make the example shorter, C++14 and rename it.

1uc merged commit 7c105ea into highfive-devs:main Dec 15, 2025
37 checks passed

antonysigma deleted the compile-time-regular-hyperslab branch December 15, 2025 19:23

Avoid dynamic allocation with H5Sselect_hyperslab #101

Avoid dynamic allocation with H5Sselect_hyperslab #101

Uh oh!

Conversation

antonysigma commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

1uc commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antonysigma commented Dec 8, 2025

Uh oh!

Uh oh!

Uh oh!

1uc Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1uc left a comment

Choose a reason for hiding this comment

Uh oh!

1uc commented Dec 11, 2025

Uh oh!

antonysigma left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1uc commented Dec 12, 2025

Uh oh!

1uc commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Dec 12, 2025

Codecov Report

Uh oh!

antonysigma commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Avoid dynamic allocation with `H5Sselect_hyperslab` #101

Avoid dynamic allocation with `H5Sselect_hyperslab` #101

antonysigma commented Dec 5, 2025 •

edited

Loading

1uc commented Dec 6, 2025 •

edited

Loading

1uc Dec 11, 2025 •

edited

Loading

1uc commented Dec 12, 2025 •

edited

Loading