Skip to content

perf: improve mapping lookup performance#144

Draft
Malte Janz (MalteJanz) wants to merge 2 commits intotrunkfrom
perf/improve-mapping-lookup-performance
Draft

perf: improve mapping lookup performance#144
Malte Janz (MalteJanz) wants to merge 2 commits intotrunkfrom
perf/improve-mapping-lookup-performance

Conversation

@MalteJanz
Copy link
Contributor

@MalteJanz Malte Janz (MalteJanz) commented Feb 27, 2026

This is just an experiment / PoC for now (created on a circle day)

Context

  • In our Migration-Assistant we have Converter classes which map data from the source system to the SW6 schema
  • This process runs in batches of 100 entities, e.g. 100 products and for each one the corresponding Converter convert method is called
  • During the conversion, the Converter has to map foreign keys in lots of places (e.g. manufacturerId of 42 in SW5 to the correct SW6 UUID). This is where the MappingService comes in
  • Right now it's doing one DB query for every mapping lookup, but caches the results in memory. The chance of a cache miss is assumed to be rather high
  • Many small DB queries are slow and it's often called the N+1 Query problem
  • It would be nice to fetch all necessary Mappings in a single DB query to improve the performance significantly
  • And doing so without making the converter logic too complex
    • e.g. we also had the optimization to "remember" all used mappings for an entity and storing that, so for future migrations these are preloaded. This was implemented in a manual way and is quite easy to miss and not benefit from. I would like to get rid of this extra complexity that only brings value for the second migration onward of the same entity and instead provide a solution that works all the time and is rather simple.

Baseline benchmark

I explained the environment of these benchmarks in a previous PR so I only focus on the results here:

Perfect environment

Let's start with looking at a usual product convert batch of 100 entities:
https://blackfire.io/profiles/37d53432-4c24-4f51-b79e-507449c25ee6/graph

The SwagMigrationAssistant\Migration\Mapping\MappingService::getMapping method:

  • has a total wall time of 477ms and is responsible for 23,67% of the total wall time of the process
  • is called 3047 times
  • reached out to the DB 1263 times

But you have to keep in mind that in my dev environment, the DB is running on the same machine than the message worker PHP process. Means my network latency is near zero.

Let's see how it looks if we add a small delay of 1ms every time we reach the DB in the mapping method (by adding a usleep(1000) call there for the non cached case.

Simulating production

1ms is just an assumption, ChatGPT told me this:

  • Same host RTT: ~0.01–0.1 ms
  • Same availibity zone in AWS: ~0.2–0.8 ms (already ~5–20x slower than local loopback)
  • Same region but different zone in AWS: ~1–3 ms (~10–100x higher than local)

With that 1ms I get this result:
https://blackfire.io/profiles/57a1ea2b-0332-4584-a320-47d89942d400/graph

Notice how much different the situation looks now for the getMapping method:

  • has a total wall time of 2,28s and is responsible for 50,36% of the total wall time of the process
  • is still called the same 3047 times
  • reached out to the DB the same 1263 times but also called usleep(1000) the same number of times for simulating the network latency.

This makes it more obvious why this N+1 Query problem is such a big deal for all applications that talk to the DB in production.

Proposed solution

To keep the converter logic mostly untouched and simple to read, I would propose a concept similar to asynchronous computing (like Promises from the JS world).

The idea is simple:

  • Each getMapping call results in a placeholder / promise at first, but registers a task to lookup a certain mapping from the DB
  • At the end we can fetch all mappings from the DB in a single query
  • And replace the placeholders / fullfil the promises with the real data

One way of implementing this would be to store absolute paths inside the nested $converted array to remember which string needs to be updated, but that would be really cumbersome and error prone to use.

Fortunately PHP is quite powerful and supports variable references / aliases. I tried to experiment using them a little like pointers from lower level languages and to some degree that seems to work (see mapping-experiment.php)

Result benchmark

TBA

@@ -0,0 +1,168 @@
<?php declare(strict_types=1);

// Todo: do not merge this experiment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: remove this and document the approach properly somewhere else

Copy link
Contributor Author

@MalteJanz Malte Janz (MalteJanz) Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't come very far with implementing this as an actual MappingServiceV2 and using that in our ProductConverter to validate this idea further today.

But I'm still curious what others would think about this rather unusual approach 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant