Add Hash Table Module#125
Conversation
Adds open addressing hash table module with linear probing. Adds unit test
| BmErr hash_look_up(Hash *hash, uint32_t key, void *data); | ||
| BmErr hash_remove(Hash *hash, uint32_t key); | ||
| uint16_t hash_get_count(Hash *hash); | ||
| uint8_t hash_get_load(Hash *hash); |
There was a problem hiding this comment.
Load factor will be a great metric to track on Bristlemouth networks to help determine how large these hash tables should be for routing purposes.
|
Cool stuff! The main thing that occurs to me first is that this can't be the whole solution since what we're hashing are topic strings and subscriptions can have wildcards. One solution to this is a topic trie. Tries are commonly used for autocomplete, and this hierarchical pub-sub topic routing problem is structured the same. Mosquitto (a common MQTT pub-sub broker) uses a topic trie. The beginning of the topic string (like Our wildcards are different and also new and not widely used yet. As you keep working on the resource-based routing implementation, consider whether we might deprecate certain wildcards and just use whole-segement and whole-subtree wildcards like MQTT. It would be less flexibility for Bristlemouth developers and more structure for how to decide on new topics. It also definitely covers the most commonly useful cases like * for a whole segment to match any node id for example.
Another thing I found in my research is that a naive implementation of a trie can consume a bunch of memory for empty nodes. Patricia trees (described in a subsection of the trie wikipedia article) are designed to efficiently mitigate this problem. And here are a couple references Claude pointed me to that look instructive, but I haven't read them yet. LC-Trie means "level compressed trie" and LPM means "longest prefix match":
Here's one of the first search results for LC-Trie, which looks brief and hlepful: https://raminaji.wordpress.com/lc-trie/ Have fun! ❤️ |
|
@towynlin thank you I really appreciate all of the information! I am researching similar implementations and creating a document currently describing the design path for routing. Wildcard topics were my largest concern as well and have realized the mighty trie is most likely the most viable solution! I will share the document with the community once I have it finished. To answer:
I do like having the ability to control our own implementation, if we need to add or reduce functionality it can be done without patching another code-base. |
What changed?
Adds open addressing hash table module with linear probing to be used with Bristlemouth middleware routing.
This handle collisions by linearly searching for the next available slot in the table to insert data into.
Adds unit test.
How does it make Bristlemouth better?
This is a proposed method to do routing in Bristlemouth.$$O(1)$$ , the worst case complexity is $$O(n)$$ if a poor hashing algorithm is chosen.
As long as the load factor is below 0.7 the average look up time complexity is
Look up complexity can degrade quickly if the load factor increases above 0.7.
Where should reviewers focus?
When thinking of a solution to go with for routing messages on Bristlemouth, I thought a hash table would provide fast look up times and insertions.$$O(n)$$ ) and could provide latency in look up times as well as potential memory problems. Tables will also most likely not grow through operation as nodes will discover resources close to boot time.
Linear probing was chosen because these tables will most likely be reasonably sized (lets say max 1024 elements) and even at worse case scenarios the STM32U5 can fly through iterating through the table.
This paper explains some of the benefits that linear hashing has.
I decided against having the ability to rehash the table due to rehashing being a slow operation (
Let me know your thoughts on this implementation.
I also have not chose a hashing function yet, I know
bm_protocolutilizes fnv hashing which tends to be fairly solid (fast and unique hash calculations) for strings.Checklist