-
Notifications
You must be signed in to change notification settings - Fork 15
Avoid dynamic allocation with H5Sselect_hyperslab
#101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid dynamic allocation with H5Sselect_hyperslab
#101
Conversation
|
Thank you for sharing this. There's two common reasons you'd want this:
If it's a performance issue and you have measurements/benchmarks that you're allowed to share, would you mind doing so? There's a different approach that might solve the issue for all users: pooled allocation. The advantage would be that everyone gets to benefit, and there's only one way of doing this. Generally, could you explain a bit why you want this? I'm not at all against it, if it solves a (reasonably common) issue. However, since it duplicates existing functionality, it needs a little bit of consideration. |
1501238 to
62cd478
Compare
Implement `HighFive::RegularHyperSlabNoMalloc<Rank>` that takes hyperslab variables in stack space. Do not construct `std::vector<hsize_t>`. Similarly, implement `::select(RegularHyperSlabNoMalloc<>)` such that it is free of `operator new[]` calls before invoking `H5Sselect_hyperslab`. Ensure that when hyperslab offset and range are compile-time constants, compiler (with arguments `-O3 -flto`) will inline everything without any `operator new[]` statements.
62cd478 to
49151a0
Compare
|
Hi @1uc , thank you for reviewing the draft code. Yes, more than 50% of my projects involves (A) deploying the dynamic library And then, (B) around 5% of my project involves statically linking So it is less of a performance requirement, but more of solving the latency at the root, eliminating all dynamic allocations during IO. Also acknowledged that pooled allocation (also called arena allocator) is available as a drop-in replacement, such as TCMalloc, and MiMalloc. It works well with (A), but failed miserably with (B) because of the lack of mingw supports. P.S. Just curious... how much idea cross-pollination happens between HighFive and Steven-Verga-H5Cpp? I concur that my PR is more suitable there, as Too bad that H5Cpp developers stopped maintaining the project for over 4 years. That's why I favor HighFive over H5Cpp. |
H5Sselect_hyperslabH5Sselect_hyperslab
H5Sselect_hyperslabH5select_hyperslab
H5select_hyperslabH5Sselect_hyperslab
| template <typename... Args> | ||
| inline DataSpace::DataSpace(size_t dim1, Args... dims) | ||
| : DataSpace(std::vector<size_t>{dim1, static_cast<size_t>(dims)...}) {} | ||
| : DataSpace(std::array{dim1, static_cast<size_t>(dims)...}) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also easy to merge. (Technically, it uses C++17, but it's easy to fix.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Help wanted to refactoring the code for c++14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, no worries :) Same goes for all other requests for help. If I have permission to push commits to your branch, that's the easiest way for me to make the changes. I'll try to find time tomorrow. Nice feature, thank you.
| auto memspace = DataSpace(std::array<size_t, 1>{n_elements}); | ||
|
|
||
| return detail::make_selection(memspace, filespace, details::get_dataset(slice)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If RegularHyperSlabNoMalloc where user defined, I don't know how to solve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you might have noticed I'm not so keen on having both a statically sized version and fixed-sized version. Especially, because it's currently not done for performance reasons and HighFive is very consistent about using std::vector for shapes.
Now for the good news: we could use SFINAE for select allowing users to pass in any type for which hyper_slab.apply(filespace) is valid (and maybe something else). Once, HighFive can accept anything "slab-like", there's no need to implement select specifically for RegularHyperSlabNoMalloc. For now RegularHyperSlabNoMalloc could live outside of HighFive (or in an example). That way even if you fork HighFive, rebasing your changes would be trivial, e.g. if you put RegularHyperSlabNoMalloc in a separate header, the chance of a merge conflict seems zero. Also, not adding it now doesn't mean we can't ever add it.
@antonysigma Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: moving RegularHyperSlabNoMalloc<Rank> to src/examples/ folder. For sure! Please check out the new changes.
Re: SFINAE for select(const T& hyper_slab). Sounds good to me. Please review the new changes following your advises. I suppose the system can tolerate the additional H5Sget_select_npoints calls for now. Once HighFive project migrates to c++17, we have more advanced tricks to compute npoints at compile-time.
Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.
Yes, help wanted to tighten the constraints eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll first try with H5Sget_select_npoints if profiling/analysis shows it's bad, then computing the size of the hyperslab will become (optional if need be) part of the HyperSlab interface.
1uc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you not need HyperSlab, i.e. the combination of multiple regular hyperslabs?
|
@antonysigma sorry for the delay. To the best of my knowledge there's very little connection to H5Cpp. HighFive was started by (I believe) a PhD student at BBP. Later the HPC group of BBP took over maintenance and expanded it considerable. Much later I joined and made it more robust and helped or added some advanced features. It's a little surprising that you're able to use HighFive at all in those settings. HighFive doesn't play well with 32 architectures, because it uses The second reason it's surprising is because HighFive is quite liberal about using |
antonysigma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @1uc ,
Thank you for the code review. I moved RegularHyperSlabNoMalloc<Rank> out of the include/ folder. Please read the inlined comments below.
Do you not need HyperSlab, i.e. the combination of multiple regular hyperslabs?
I am good so far. Almost all of the embedded soft real-time use cases do not require non-regular hyperslab. Actually, Python h5py community reported similar performance regression of union selection versus calling individual select(...).read(...) separately.
To the best of my knowledge there's very little connection to H5Cpp. HighFive was started by (I believe) a PhD student at BBP. Later the HPC group of BBP took over maintenance and expanded it considerable. Much later I joined and made it more robust and helped or added some advanced features.
Thanks. I also tracked down the presentation transcript. Yes, Highfive and h5cpp serves two different users. The former serves 5D (dimensions: XYZTC) microscopy datasets residing in heap space memory, and the latter serves high-frequency trading data (i.e. Tables with scalar values in table cells) residing in stack space memory.
Also adding @steven-varga here for context.
It's a little surprising that you're able to use HighFive at all in those settings. HighFive doesn't play well with 32 architectures, because it uses size_t as if it were equivalent to hsize_t. (Accidentally, that works on 64-bit, but we've had credible reports that it breaks on 32-bit. The naive fix is too invasive; and we've never tried to solve it cleanly without breaking for existing users.)
Nvidia Jetson Nano (maxwell chipset) has 64-bit ARM CPUs. I suppose you mean Xilinx Zynq7010. Ah, that explains a lot about the runtime errors that I experienced with Zynq7010 and Highfive back then. Thank you for the reminder.
The second reason it's surprising is because HighFive is quite liberal about using std::vector for shape. It's everywhere.
Fortunately, my projects does not require full MISRA/JSF compliance.
I only need to demonstrate that the critical IO path contains no operator new or malloc calls via the objdump tool.
Speaking of which, I was expecting C++20's constexpr std::vector<> to eliminate most of the transient memory allocations in HighFive at compile-time. So, it is perfectly fine for us to stay put, and let the ISO-C++ committee do the work for us.
AFAIK, the C++20 compliant compiler eliminates all memory allocations of std::vector at compile-time, if it can prove that:
- all vector objects are constructed and destructed in the function,
- function arguments do not contain pointers or
std::vector::iterator, and - the function does not leak the pointers of the locally constructed
vectorin thereturnstatement, or leak the pointer to anywhere else outside the scope.
| template <typename... Args> | ||
| inline DataSpace::DataSpace(size_t dim1, Args... dims) | ||
| : DataSpace(std::vector<size_t>{dim1, static_cast<size_t>(dims)...}) {} | ||
| : DataSpace(std::array{dim1, static_cast<size_t>(dims)...}) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Help wanted to refactoring the code for c++14.
| auto memspace = DataSpace(std::array<size_t, 1>{n_elements}); | ||
|
|
||
| return detail::make_selection(memspace, filespace, details::get_dataset(slice)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re: moving RegularHyperSlabNoMalloc<Rank> to src/examples/ folder. For sure! Please check out the new changes.
Re: SFINAE for select(const T& hyper_slab). Sounds good to me. Please review the new changes following your advises. I suppose the system can tolerate the additional H5Sget_select_npoints calls for now. Once HighFive project migrates to c++17, we have more advanced tricks to compute npoints at compile-time.
Would this work for you? The SFINAE part isn't strictly needed, in a first version an unconstrained template parameter would suffice. That way if you don't feel like fighting SFINAE, I can take care of it later.
Yes, help wanted to tighten the constraints eventually.
| /* | ||
| * Copyright (c), 2017, Adrien Devresse | ||
| * | ||
| * Distributed under the Boost Software License, Version 1.0. | ||
| * (See accompanying file LICENSE_1_0.txt or copy at | ||
| * http://www.boost.org/LICENSE_1_0.txt) | ||
| * | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind the author to be Adrien, or who ever can compose the CMake file for me. Help wanted here to compose the CMakeLists.txt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's either your name or "HighFive developers" and the year is 2025. Let me know which one you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure. My name "Antony Chan".
|
@antonysigma before I forget. Combining multiple hyperslabs requires care:
|
The changes are: - Make HyperSlabInterface an CRTP base class. - Make the example shorter, C++14 and rename it.
|
Right, |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
@1uc It works for me. Thank you for revising the code. |
Implement
HighFive::RegularHyperSlabNoMalloc<Rank>that takes hyperslab variables in stack space. Do not constructstd::vector<hsize_t>.Similarly, implement
::select(RegularHyperSlabNoMalloc<>)such that it is free ofoperator new[]calls before invokingH5Sselect_hyperslab.Ensure that when hyperslab offset and range are compile-time constants, compiler (with arguments
-O3 -flto) will inline everything without anyoperator new[]statements.