wasmtime slow due to lock contention in kernel during munmap()

Encountered an interesting bug while trying to port [wasmtime](https://github.com/bytecodealliance/wasmtime/) to illumos, as documented in https://github.com/bytecodealliance/wasmtime/pull/9535.

STR (note `+beta`, compiling wasmtime on illumos requires Rust 1.83 or above):

```
git clone https://github.com/bytecodealliance/wasmtime
cd wasmtime
git checkout 44da05665466edb301558aa617d9a7bff295c461
git submodule init
git submodule update --recursive

cargo +beta test --test wast -- --test-threads 1 Cranelift/pooling/tests/spec_testsuite/load.wast
```

This takes around 0.07 seconds on Linux but around 5-6 seconds on illumos.

DTrace samples:

* User stacks: https://gist.github.com/sunshowers/b69b7bd2e671d9c23355d5e952636c5e
* Kernel stacks: https://gist.github.com/sunshowers/fa822f161e54d57a8103f6736656fbe8

From my naive reading of particularly the kernel stacks, it seems like most of the time is being spent waiting on locks to various degrees.

Per Alex Crichton in [this comment](https://github.com/bytecodealliance/wasmtime/pull/9535#pullrequestreview-2409051564):

> Whoa! It looks like the pooling allocator is the part that's slow here and that, by default, has a large number of virtual memory mappings associated with it. For example it'll allocate terabytes of virtual memory and then within that giant chunk it'll slice up roughly 10_000 linear memories (each with guard regions between them). These are prepared with a MemoryImageSlot each.
>
> My guess is that the way things are managed is tuned to "this is acceptable due to some fast path in Linux we're hidding" which we didn't really design for and just happened to run across.

This corresponds to [`PoolingInstanceAllocator`](https://github.com/bytecodealliance/wasmtime/blob/44da05665466edb301558aa617d9a7bff295c461/crates/wasmtime/src/runtime/vm/instance/allocator/pooling.rs#L271) in wasmtime. Alex suggests possibly tweaking how the allocator works either on illumos or generally, but given the performance difference between illumos and Linux it seems that a kernel-level improvement might help.

cc @iximeow, @rmustacc who I briefly chatted with about this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasmtime slow due to lock contention in kernel during munmap() #177

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

wasmtime slow due to lock contention in kernel during munmap() #177

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions