Skip to content

wasmtime slow due to lock contention in kernel during munmap() #177

@sunshowers

Description

@sunshowers

Encountered an interesting bug while trying to port wasmtime to illumos, as documented in bytecodealliance/wasmtime#9535.

STR (note +beta, compiling wasmtime on illumos requires Rust 1.83 or above):

git clone https://github.com/bytecodealliance/wasmtime
cd wasmtime
git checkout 44da05665466edb301558aa617d9a7bff295c461
git submodule init
git submodule update --recursive

cargo +beta test --test wast -- --test-threads 1 Cranelift/pooling/tests/spec_testsuite/load.wast

This takes around 0.07 seconds on Linux but around 5-6 seconds on illumos.

DTrace samples:

From my naive reading of particularly the kernel stacks, it seems like most of the time is being spent waiting on locks to various degrees.

Per Alex Crichton in this comment:

Whoa! It looks like the pooling allocator is the part that's slow here and that, by default, has a large number of virtual memory mappings associated with it. For example it'll allocate terabytes of virtual memory and then within that giant chunk it'll slice up roughly 10_000 linear memories (each with guard regions between them). These are prepared with a MemoryImageSlot each.

My guess is that the way things are managed is tuned to "this is acceptable due to some fast path in Linux we're hidding" which we didn't really design for and just happened to run across.

This corresponds to PoolingInstanceAllocator in wasmtime. Alex suggests possibly tweaking how the allocator works either on illumos or generally, but given the performance difference between illumos and Linux it seems that a kernel-level improvement might help.

cc @iximeow, @rmustacc who I briefly chatted with about this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions