Skip to content

Improve performance: Subsets #27

@arthur00

Description

@arthur00

CADIS needs to improve performance to meet demands such as 100 ms ticks.

This ticket is to improve performance for calculating queries in subsets.

Description:

Retrieving sets from the store was improved by maintaining a list of updates per sim. Every attached simulator has its own data structure, keeping track of what new, updated, and deleted objects have changed since last pull.

With subsets, this is not possible. With every pull of a subset, the query needs to be re-executed, and the whole subset is sent back to the client. There are two issues with this approach:

  1. Executing the query takes time. In the committed example, EmptyBusiness is a subset of BusinessNode. An EmptyBusiness are all BusinessNodes that has no Person object attached to it (through its foreign key 'EmployedBy'). To calculate this, we need to do:

    def query(store):
      bns = store.get(BusinessNode, False)
      res = []
      ppl = store.get(Person, False)
      for b in bns:
          occupied = False
          for p in ppl:
              if p.EmployedBy == b.Name:
                  occupied = True
                  continue
          if not occupied:
              res.append(b)
      return res

    This is a n^2 complexity search, requiring copying two large arrays of objects (BusinessNode and Person). In tests with about 50 Person and BusinessNode objects each, this took at least 200ms to run.

    How can we improve this performance? Some suggestions:

    1. Improve how we copy/construct objects. This would also help performance with store.get of thousands of objects, which also takes seconds to run.
    2. Cache results and send deltas of the subset
    3. Assuming subset queries always returns a collection of the same object type, send only IDs on the pull. This would save time sending a lot of data (particular over a network), but would still have the issue of a, requiring copying of thousands of objects in memory at every tick.
  2. The simulator might be interested in this set only once or twice. Thus we are recalculating a subset constantly, taking time, and not really using it. To address this problem, patch cc7bd7b added a disabled flag to allow the simulator to disable subset pull, allowing the store to avoid recalculating the query every tick. However this is an inelegant solution. Can we do better?

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions