Skip to content

Add Hash Table Module#125

Open
matt001k wants to merge 2 commits into
mainfrom
feature/hash_table
Open

Add Hash Table Module#125
matt001k wants to merge 2 commits into
mainfrom
feature/hash_table

Conversation

@matt001k
Copy link
Copy Markdown
Contributor

What changed?

Adds open addressing hash table module with linear probing to be used with Bristlemouth middleware routing.
This handle collisions by linearly searching for the next available slot in the table to insert data into.
Adds unit test.

How does it make Bristlemouth better?

This is a proposed method to do routing in Bristlemouth.
As long as the load factor is below 0.7 the average look up time complexity is $$O(1)$$, the worst case complexity is $$O(n)$$ if a poor hashing algorithm is chosen.
Look up complexity can degrade quickly if the load factor increases above 0.7.

Where should reviewers focus?

When thinking of a solution to go with for routing messages on Bristlemouth, I thought a hash table would provide fast look up times and insertions.
Linear probing was chosen because these tables will most likely be reasonably sized (lets say max 1024 elements) and even at worse case scenarios the STM32U5 can fly through iterating through the table.
This paper explains some of the benefits that linear hashing has.
I decided against having the ability to rehash the table due to rehashing being a slow operation ($$O(n)$$) and could provide latency in look up times as well as potential memory problems. Tables will also most likely not grow through operation as nodes will discover resources close to boot time.

Let me know your thoughts on this implementation.
I also have not chose a hashing function yet, I know bm_protocol utilizes fnv hashing which tends to be fairly solid (fast and unique hash calculations) for strings.

Checklist

  • Add or update unit tests for changed code
  • Ensure all submodules up to date. If this PR relies on changes in submodules, merge those PRs first, then point this PR at/after the merge commit
  • Ensure code is formatted correctly with clang-format. If there are large formatting changes, they should happen in a separate whitespace-only commit on this PR after all approvals.

Adds open addressing hash table module with linear probing.
Adds unit test
Comment thread common/hash.h
BmErr hash_look_up(Hash *hash, uint32_t key, void *data);
BmErr hash_remove(Hash *hash, uint32_t key);
uint16_t hash_get_count(Hash *hash);
uint8_t hash_get_load(Hash *hash);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Load factor will be a great metric to track on Bristlemouth networks to help determine how large these hash tables should be for routing purposes.

@matt001k matt001k self-assigned this Mar 11, 2026
@matt001k matt001k added the enhancement New feature or request label Mar 11, 2026
@towynlin
Copy link
Copy Markdown
Contributor

Cool stuff! The main thing that occurs to me first is that this can't be the whole solution since what we're hashing are topic strings and subscriptions can have wildcards.

One solution to this is a topic trie. Tries are commonly used for autocomplete, and this hierarchical pub-sub topic routing problem is structured the same.

Mosquitto (a common MQTT pub-sub broker) uses a topic trie. The beginning of the topic string (like sensor for most Bristlemouth topics) is a child of the root node, and subsequent segments are children of that. Each node in the trie that matches the end of a topic string subscription or wildcard has a doubly-linked list of subscriptions.

Our wildcards are different and also new and not widely used yet. As you keep working on the resource-based routing implementation, consider whether we might deprecate certain wildcards and just use whole-segement and whole-subtree wildcards like MQTT. It would be less flexibility for Bristlemouth developers and more structure for how to decide on new topics. It also definitely covers the most commonly useful cases like * for a whole segment to match any node id for example.


Another thing I found in my research is that a naive implementation of a trie can consume a bunch of memory for empty nodes. Patricia trees (described in a subsection of the trie wikipedia article) are designed to efficiently mitigate this problem.

And here are a couple references Claude pointed me to that look instructive, but I haven't read them yet. LC-Trie means "level compressed trie" and LPM means "longest prefix match":

  • Nilsson & Karlsson, "IP-address Lookup Using LC-Tries" (IEEE JSAC 1999) — This is what the Linux kernel uses. Section 3 explains why tries dominate for prefix matching; Section 4 covers the level-compression that makes them practical.
  • Waldvogel et al., "Scalable High-Speed IP Routing Lookups" (SIGCOMM '97) — Binary search on prefix lengths using hash tables per level. This is one of the few legitimate uses of hash tables in LPM, but it's complex. Sections 2–3 are the core.

Here's one of the first search results for LC-Trie, which looks brief and hlepful: https://raminaji.wordpress.com/lc-trie/

Have fun! ❤️

@matt001k
Copy link
Copy Markdown
Contributor Author

@towynlin thank you I really appreciate all of the information! I am researching similar implementations and creating a document currently describing the design path for routing. Wildcard topics were my largest concern as well and have realized the mighty trie is most likely the most viable solution! I will share the document with the community once I have it finished.

To answer:

Is there an advantage to building our own hash table implementation?

I do like having the ability to control our own implementation, if we need to add or reduce functionality it can be done without patching another code-base.
It is also fairly simple and easily testable, probably with a lot less storage space than other implementations!
Having the static allocation ability also provides a future where bm_core can exist without a heap which is a necessity in safety critical systems.
When I was working on Class C medical devices, firmware had to abide to MISRA coding standards.
M21.3 of MISRA prevents any use of the heap.
I could see similar standards existing for individuals who work in industries with aquatic vehicles/robotics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants