Skip to content

Conversation

@antonysigma
Copy link
Contributor

@antonysigma antonysigma commented Dec 5, 2025

Request for comments: I have this nice patch in C++17 that avoids excessive std::vector construction and destruction when I do frequent dataset.select(...).read() and write from multiple threads. Happy to polish the code with you all to make it compile with C++14.

Implement HighFive::RegularHyperSlabNoMalloc<Rank> that takes hyperslab variables in stack space. Do not construct std::vector<hsize_t>.

Similarly, implement ::select(RegularHyperSlabNoMalloc<>) such that it is free of operator new[] calls before invoking H5Sselect_hyperslab.

Ensure that when hyperslab offset and range are compile-time constants, compiler (with arguments -O3 -flto) will inline everything without any operator new[] statements.

@1uc
Copy link
Contributor

1uc commented Dec 6, 2025

Thank you for sharing this. There's two common reasons you'd want this:

  • embedded (when dynamic allocations are banned),
  • performance.

If it's a performance issue and you have measurements/benchmarks that you're allowed to share, would you mind doing so? There's a different approach that might solve the issue for all users: pooled allocation. The advantage would be that everyone gets to benefit, and there's only one way of doing this.

Generally, could you explain a bit why you want this? I'm not at all against it, if it solves a (reasonably common) issue. However, since it duplicates existing functionality, it needs a little bit of consideration.

@antonysigma antonysigma force-pushed the compile-time-regular-hyperslab branch 2 times, most recently from 1501238 to 62cd478 Compare December 7, 2025 04:52
Implement `HighFive::RegularHyperSlabNoMalloc<Rank>` that takes
hyperslab variables in stack space. Do not construct
`std::vector<hsize_t>`.

Similarly, implement `::select(RegularHyperSlabNoMalloc<>)` such that it
is free of `operator new[]` calls before invoking `H5Sselect_hyperslab`.

Ensure that when hyperslab offset and range are compile-time constants,
compiler (with arguments `-O3 -flto`) will inline everything without
any `operator new[]` statements.
@antonysigma antonysigma force-pushed the compile-time-regular-hyperslab branch from 62cd478 to 49151a0 Compare December 8, 2025 05:38
@antonysigma
Copy link
Contributor Author

Hi @1uc , thank you for reviewing the draft code. Yes, more than 50% of my projects involves (A) deploying the dynamic library libhdf5.so and my executable onto a soft-real time system having a weak CPU. Similar to the specification of the 2019-era Jetson Nano. So, there is a strict coding restrictions on dynamic allocations in the camera capture & File IO event loop.

And then, (B) around 5% of my project involves statically linking libhdf5.a and HighFive with mingw64 cross-compiler. The output EXE must be statically linked. And what's more, the objdump output of the compiled exe must prove the absence of the operator new[] and malloc in the performance critical frame read/write paths. Malloc calls from libhdf5.a is acceptable as long as it is outside the critical loop.

So it is less of a performance requirement, but more of solving the latency at the root, eliminating all dynamic allocations during IO.

Also acknowledged that pooled allocation (also called arena allocator) is available as a drop-in replacement, such as TCMalloc, and MiMalloc. It works well with (A), but failed miserably with (B) because of the lack of mingw supports.

P.S. Just curious... how much idea cross-pollination happens between HighFive and Steven-Verga-H5Cpp? I concur that my PR is more suitable there, as constexpr and consteval were used extensively by H5Cpp devs to ensure compile-time hardcoding of offset and count variables.

Too bad that H5Cpp developers stopped maintaining the project for over 4 years. That's why I favor HighFive over H5Cpp.

@antonysigma antonysigma changed the title Avoid memory allocation with H5Sselect_hyperslab Avoid dynamic allocation with H5Sselect_hyperslab Dec 8, 2025
@antonysigma antonysigma changed the title Avoid dynamic allocation with H5Sselect_hyperslab Avoid dynamic allocation with H5select_hyperslab Dec 8, 2025
@antonysigma antonysigma changed the title Avoid dynamic allocation with H5select_hyperslab Avoid dynamic allocation with H5Sselect_hyperslab Dec 8, 2025
template <typename... Args>
inline DataSpace::DataSpace(size_t dim1, Args... dims)
: DataSpace(std::vector<size_t>{dim1, static_cast<size_t>(dims)...}) {}
: DataSpace(std::array{dim1, static_cast<size_t>(dims)...}) {}
Copy link
Contributor

@1uc 1uc Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also easy to merge. (Technically, it uses C++17, but it's easy to fix.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Help wanted to refactoring the code for c++14.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, no worries :) Same goes for all other requests for help. If I have permission to push commits to your branch, that's the easiest way for me to make the changes. I'll try to find time tomorrow. Nice feature, thank you.

auto memspace = DataSpace(std::array<size_t, 1>{n_elements});

return detail::make_selection(memspace, filespace, details::get_dataset(slice));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If RegularHyperSlabNoMalloc where user defined, I don't know how to solve this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you might have noticed I'm not so keen on having both a statically sized version and fixed-sized version. Especially, because it's currently not done for performance reasons and HighFive is very consistent about using std::vector for shapes.

Now for the good news: we could use SFINAE for select allowing users to pass in any type for which hyper_slab.apply(filespace) is valid (and maybe something else). Once, HighFive can accept anything "slab-like", there's no need to implement select specifically for RegularHyperSlabNoMalloc. For now RegularHyperSlabNoMalloc could live outside of HighFive (or in an example). That way even if you fork HighFive, rebasing your changes would be trivial, e.g. if you put RegularHyperSlabNoMalloc in a separate header, the chance of a merge conflict seems zero. Also, not adding it now doesn't mean we can't ever add it.

@antonysigma Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: moving RegularHyperSlabNoMalloc<Rank> to src/examples/ folder. For sure! Please check out the new changes.

Re: SFINAE for select(const T& hyper_slab). Sounds good to me. Please review the new changes following your advises. I suppose the system can tolerate the additional H5Sget_select_npoints calls for now. Once HighFive project migrates to c++17, we have more advanced tricks to compute npoints at compile-time.

Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.

Yes, help wanted to tighten the constraints eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll first try with H5Sget_select_npoints if profiling/analysis shows it's bad, then computing the size of the hyperslab will become (optional if need be) part of the HyperSlab interface.

Copy link
Contributor

@1uc 1uc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not need HyperSlab, i.e. the combination of multiple regular hyperslabs?

@1uc
Copy link
Contributor

1uc commented Dec 11, 2025

@antonysigma sorry for the delay.

To the best of my knowledge there's very little connection to H5Cpp. HighFive was started by (I believe) a PhD student at BBP. Later the HPC group of BBP took over maintenance and expanded it considerable. Much later I joined and made it more robust and helped or added some advanced features.

It's a little surprising that you're able to use HighFive at all in those settings. HighFive doesn't play well with 32 architectures, because it uses size_t as if it were equivalent to hsize_t. (Accidentally, that works on 64-bit, but we've had credible reports that it breaks on 32-bit. The naive fix is too invasive; and we've never tried to solve it cleanly without breaking for existing users.)

The second reason it's surprising is because HighFive is quite liberal about using std::vector for shape. It's everywhere.

Copy link
Contributor Author

@antonysigma antonysigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @1uc ,

Thank you for the code review. I moved RegularHyperSlabNoMalloc<Rank> out of the include/ folder. Please read the inlined comments below.

Do you not need HyperSlab, i.e. the combination of multiple regular hyperslabs?

I am good so far. Almost all of the embedded soft real-time use cases do not require non-regular hyperslab. Actually, Python h5py community reported similar performance regression of union selection versus calling individual select(...).read(...) separately.

To the best of my knowledge there's very little connection to H5Cpp. HighFive was started by (I believe) a PhD student at BBP. Later the HPC group of BBP took over maintenance and expanded it considerable. Much later I joined and made it more robust and helped or added some advanced features.

Thanks. I also tracked down the presentation transcript. Yes, Highfive and h5cpp serves two different users. The former serves 5D (dimensions: XYZTC) microscopy datasets residing in heap space memory, and the latter serves high-frequency trading data (i.e. Tables with scalar values in table cells) residing in stack space memory.

Also adding @steven-varga here for context.

It's a little surprising that you're able to use HighFive at all in those settings. HighFive doesn't play well with 32 architectures, because it uses size_t as if it were equivalent to hsize_t. (Accidentally, that works on 64-bit, but we've had credible reports that it breaks on 32-bit. The naive fix is too invasive; and we've never tried to solve it cleanly without breaking for existing users.)

Nvidia Jetson Nano (maxwell chipset) has 64-bit ARM CPUs. I suppose you mean Xilinx Zynq7010. Ah, that explains a lot about the runtime errors that I experienced with Zynq7010 and Highfive back then. Thank you for the reminder.

The second reason it's surprising is because HighFive is quite liberal about using std::vector for shape. It's everywhere.

Fortunately, my projects does not require full MISRA/JSF compliance.
I only need to demonstrate that the critical IO path contains no operator new or malloc calls via the objdump tool.

Speaking of which, I was expecting C++20's constexpr std::vector<> to eliminate most of the transient memory allocations in HighFive at compile-time. So, it is perfectly fine for us to stay put, and let the ISO-C++ committee do the work for us.

AFAIK, the C++20 compliant compiler eliminates all memory allocations of std::vector at compile-time, if it can prove that:

  1. all vector objects are constructed and destructed in the function,
  2. function arguments do not contain pointers or std::vector::iterator, and
  3. the function does not leak the pointers of the locally constructed vector in the return statement, or leak the pointer to anywhere else outside the scope.

template <typename... Args>
inline DataSpace::DataSpace(size_t dim1, Args... dims)
: DataSpace(std::vector<size_t>{dim1, static_cast<size_t>(dims)...}) {}
: DataSpace(std::array{dim1, static_cast<size_t>(dims)...}) {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Help wanted to refactoring the code for c++14.

auto memspace = DataSpace(std::array<size_t, 1>{n_elements});

return detail::make_selection(memspace, filespace, details::get_dataset(slice));
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: moving RegularHyperSlabNoMalloc<Rank> to src/examples/ folder. For sure! Please check out the new changes.

Re: SFINAE for select(const T& hyper_slab). Sounds good to me. Please review the new changes following your advises. I suppose the system can tolerate the additional H5Sget_select_npoints calls for now. Once HighFive project migrates to c++17, we have more advanced tricks to compute npoints at compile-time.

Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.

Yes, help wanted to tighten the constraints eventually.

Comment on lines 1 to 8
/*
* Copyright (c), 2017, Adrien Devresse
*
* Distributed under the Boost Software License, Version 1.0.
* (See accompanying file LICENSE_1_0.txt or copy at
* http://www.boost.org/LICENSE_1_0.txt)
*
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the author to be Adrien, or who ever can compose the CMake file for me. Help wanted here to compose the CMakeLists.txt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's either your name or "HighFive developers" and the year is 2025. Let me know which one you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. My name "Antony Chan".

@antonysigma antonysigma marked this pull request as ready for review December 11, 2025 19:16
@1uc
Copy link
Contributor

1uc commented Dec 12, 2025

@antonysigma before I forget. Combining multiple hyperslabs requires care:

  1. Combining N regular hyperslabs naively in a loop takes quadratic runtime. In hindsight it's obvious: A hyperslab is stored as (the N-dimensional equivalent of) a sorted linked list of non-overlapping regular hyperslabs. The issue is that finding the location to insert the next block takes O(N) time, because one need to traverse the list. Therefore, O(N**2) to insert N slabs iteratively. There's API that can combine to general hyperslabs in time O(n + m). This allows one to do divide and conquer. It's implemented in HighFive, see e9492c1 I reported it upstream: https://forum.hdfgroup.org/t/quadratic-runtime-of-selecting-n-hyperslabs/12555

  2. Using Darshan we saw that HDF5 didn't combine reads. Therefore, if you select [1, 3, 5, 7, 9] it might be better to load [1, 2, 3, 4, 5, 6, 7, 8, 9] into a buffer and discard all even values. We did this in a different layer of BBP's I/O stack. It can be found here, it does the obvious (and only for a specific case) so I'm not sure it's worth reading: https://github.com/BlueBrain/libsonata/blame/master/src/read_canonical_selection.hpp

The changes are:

  - Make HyperSlabInterface an CRTP base class.
  - Make the example shorter, C++14 and rename it.
@1uc
Copy link
Contributor

1uc commented Dec 12, 2025

Right, we can't get CI green (edit: nvm, today's a good day and ubuntu-22.04 still works), because the coverage CI doesn't work on ubuntu-24.04. @antonysigma if the remaining CI goes green, would you like to test it with your application first before we merge?

@codecov
Copy link

codecov bot commented Dec 12, 2025

Codecov Report

❌ Patch coverage is 30.30303% with 23 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/examples/select_partial_dataset_no_alloc.cpp 0.00% 23 Missing ⚠️

📢 Thoughts on this report? Let us know!

@antonysigma
Copy link
Contributor Author

Right, we can't get CI green (edit: nvm, today's a good day and ubuntu-22.04 still works), because the coverage CI doesn't work on ubuntu-24.04. @antonysigma if the remaining CI goes green, would you like to test it with your application first before we merge?

@1uc It works for me. Thank you for revising the code.

@1uc 1uc merged commit 7c105ea into highfive-devs:main Dec 15, 2025
37 checks passed
@antonysigma antonysigma deleted the compile-time-regular-hyperslab branch December 15, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants