Improve performance: Subsets

CADIS needs to improve performance to meet demands such as 100 ms ticks. 

This ticket is to improve performance for calculating queries in subsets. 

**Description:**

Retrieving sets from the store was improved by maintaining a list of updates per sim. Every attached simulator has its own data structure, keeping track of what new, updated, and deleted objects have changed since last pull. 

With subsets, this is not possible. With every pull of a subset, the query needs to be re-executed, and the whole subset is sent back to the client. There are two issues with this approach:
1. Executing the query takes time. In the committed example, EmptyBusiness is a subset of BusinessNode. An EmptyBusiness are all BusinessNodes that has no Person object attached to it (through its foreign key 'EmployedBy'). To calculate this, we need to do:
   
   ``` python
   def query(store):
     bns = store.get(BusinessNode, False)
     res = []
     ppl = store.get(Person, False)
     for b in bns:
         occupied = False
         for p in ppl:
             if p.EmployedBy == b.Name:
                 occupied = True
                 continue
         if not occupied:
             res.append(b)
     return res
   ```
   
   This is a n^2 complexity search, requiring copying two large arrays of objects (BusinessNode and Person). In tests with about 50 Person and BusinessNode objects each, this took at least 200ms to run.
   
   How can we improve this performance? Some suggestions:
   1. Improve how we copy/construct objects. This would also help performance with store.get of thousands of objects, which also takes seconds to run.
   2. Cache results and send deltas of the subset
   3. Assuming subset queries always returns a collection of the same object type, send only IDs on the pull. This would save time sending a lot of data (particular over a network), but would still have the issue of a, requiring copying of thousands of objects in memory at every tick.
2. The simulator might be interested in this set only once or twice. Thus we are recalculating a subset constantly, taking time, and not really using it. To address this problem, patch cc7bd7b64021eaa5c0b3bf2ef0a1eef2ff0dc233 added a disabled flag to allow the simulator to disable subset pull, allowing the store to avoid recalculating the query every tick. However this is an inelegant solution. Can we do better?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance: Subsets #27

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve performance: Subsets #27

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions