-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Encountered an interesting bug while trying to port wasmtime to illumos, as documented in bytecodealliance/wasmtime#9535.
STR (note +beta, compiling wasmtime on illumos requires Rust 1.83 or above):
git clone https://github.com/bytecodealliance/wasmtime
cd wasmtime
git checkout 44da05665466edb301558aa617d9a7bff295c461
git submodule init
git submodule update --recursive
cargo +beta test --test wast -- --test-threads 1 Cranelift/pooling/tests/spec_testsuite/load.wast
This takes around 0.07 seconds on Linux but around 5-6 seconds on illumos.
DTrace samples:
- User stacks: https://gist.github.com/sunshowers/b69b7bd2e671d9c23355d5e952636c5e
- Kernel stacks: https://gist.github.com/sunshowers/fa822f161e54d57a8103f6736656fbe8
From my naive reading of particularly the kernel stacks, it seems like most of the time is being spent waiting on locks to various degrees.
Per Alex Crichton in this comment:
Whoa! It looks like the pooling allocator is the part that's slow here and that, by default, has a large number of virtual memory mappings associated with it. For example it'll allocate terabytes of virtual memory and then within that giant chunk it'll slice up roughly 10_000 linear memories (each with guard regions between them). These are prepared with a MemoryImageSlot each.
My guess is that the way things are managed is tuned to "this is acceptable due to some fast path in Linux we're hidding" which we didn't really design for and just happened to run across.
This corresponds to PoolingInstanceAllocator in wasmtime. Alex suggests possibly tweaking how the allocator works either on illumos or generally, but given the performance difference between illumos and Linux it seems that a kernel-level improvement might help.
cc @iximeow, @rmustacc who I briefly chatted with about this.