Add Hash Table Module by matt001k · Pull Request #125 · bristlemouth/bm_core

matt001k · 2026-03-11T18:42:34Z

What changed?

Adds open addressing hash table module with linear probing to be used with Bristlemouth middleware routing.
This handle collisions by linearly searching for the next available slot in the table to insert data into.
Adds unit test.

How does it make Bristlemouth better?

This is a proposed method to do routing in Bristlemouth.
As long as the load factor is below 0.7 the average look up time complexity is $$O(1)$$, the worst case complexity is $$O(n)$$ if a poor hashing algorithm is chosen.
Look up complexity can degrade quickly if the load factor increases above 0.7.

Where should reviewers focus?

When thinking of a solution to go with for routing messages on Bristlemouth, I thought a hash table would provide fast look up times and insertions.
Linear probing was chosen because these tables will most likely be reasonably sized (lets say max 1024 elements) and even at worse case scenarios the STM32U5 can fly through iterating through the table.
This paper explains some of the benefits that linear hashing has.
I decided against having the ability to rehash the table due to rehashing being a slow operation ($$O(n)$$) and could provide latency in look up times as well as potential memory problems. Tables will also most likely not grow through operation as nodes will discover resources close to boot time.

Let me know your thoughts on this implementation.
I also have not chose a hashing function yet, I know bm_protocol utilizes fnv hashing which tends to be fairly solid (fast and unique hash calculations) for strings.

Checklist

Add or update unit tests for changed code
Ensure all submodules up to date. If this PR relies on changes in submodules, merge those PRs first, then point this PR at/after the merge commit
Ensure code is formatted correctly with clang-format. If there are large formatting changes, they should happen in a separate whitespace-only commit on this PR after all approvals.

Adds open addressing hash table module with linear probing. Adds unit test

matt001k · 2026-03-11T18:44:29Z

+BmErr hash_look_up(Hash *hash, uint32_t key, void *data);
+BmErr hash_remove(Hash *hash, uint32_t key);
+uint16_t hash_get_count(Hash *hash);
+uint8_t hash_get_load(Hash *hash);


Load factor will be a great metric to track on Bristlemouth networks to help determine how large these hash tables should be for routing purposes.

towynlin · 2026-03-14T01:28:55Z

Cool stuff! The main thing that occurs to me first is that this can't be the whole solution since what we're hashing are topic strings and subscriptions can have wildcards.

One solution to this is a topic trie. Tries are commonly used for autocomplete, and this hierarchical pub-sub topic routing problem is structured the same.

Mosquitto (a common MQTT pub-sub broker) uses a topic trie. The beginning of the topic string (like sensor for most Bristlemouth topics) is a child of the root node, and subsequent segments are children of that. Each node in the trie that matches the end of a topic string subscription or wildcard has a doubly-linked list of subscriptions.

Our wildcards are different and also new and not widely used yet. As you keep working on the resource-based routing implementation, consider whether we might deprecate certain wildcards and just use whole-segement and whole-subtree wildcards like MQTT. It would be less flexibility for Bristlemouth developers and more structure for how to decide on new topics. It also definitely covers the most commonly useful cases like * for a whole segment to match any node id for example.

mosquitto__subhier (hashable trie node) and mosquitto__subleaf (list of subscriptions at each trie node)
mosquitto uses uthash — Is there an advantage to building our own hash table implementation?
sub__search function for searching the trie when routing a published message

Another thing I found in my research is that a naive implementation of a trie can consume a bunch of memory for empty nodes. Patricia trees (described in a subsection of the trie wikipedia article) are designed to efficiently mitigate this problem.

And here are a couple references Claude pointed me to that look instructive, but I haven't read them yet. LC-Trie means "level compressed trie" and LPM means "longest prefix match":

Nilsson & Karlsson, "IP-address Lookup Using LC-Tries" (IEEE JSAC 1999) — This is what the Linux kernel uses. Section 3 explains why tries dominate for prefix matching; Section 4 covers the level-compression that makes them practical.
Waldvogel et al., "Scalable High-Speed IP Routing Lookups" (SIGCOMM '97) — Binary search on prefix lengths using hash tables per level. This is one of the few legitimate uses of hash tables in LPM, but it's complex. Sections 2–3 are the core.

Here's one of the first search results for LC-Trie, which looks brief and hlepful: https://raminaji.wordpress.com/lc-trie/

Have fun! ❤️

matt001k · 2026-03-16T18:05:25Z

@towynlin thank you I really appreciate all of the information! I am researching similar implementations and creating a document currently describing the design path for routing. Wildcard topics were my largest concern as well and have realized the mighty trie is most likely the most viable solution! I will share the document with the community once I have it finished.

To answer:

Is there an advantage to building our own hash table implementation?

I do like having the ability to control our own implementation, if we need to add or reduce functionality it can be done without patching another code-base.
It is also fairly simple and easily testable, probably with a lot less storage space than other implementations!
Having the static allocation ability also provides a future where bm_core can exist without a heap which is a necessity in safety critical systems.
When I was working on Class C medical devices, firmware had to abide to MISRA coding standards.
M21.3 of MISRA prevents any use of the heap.
I could see similar standards existing for individuals who work in industries with aquatic vehicles/robotics.

matt001k added 2 commits March 10, 2026 16:43

Add Hash Table Module

3befa7a

Adds open addressing hash table module with linear probing. Adds unit test

Add function to obtain load factor

ed5707a

matt001k commented Mar 11, 2026

View reviewed changes

matt001k self-assigned this Mar 11, 2026

matt001k added the enhancement New feature or request label Mar 11, 2026

matt001k requested review from towynlin and victorsowa12 March 11, 2026 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Hash Table Module#125

Add Hash Table Module#125
matt001k wants to merge 2 commits into
mainfrom
feature/hash_table

matt001k commented Mar 11, 2026

Uh oh!

matt001k Mar 11, 2026

Uh oh!

towynlin commented Mar 14, 2026

Uh oh!

matt001k commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matt001k commented Mar 11, 2026

What changed?

How does it make Bristlemouth better?

Where should reviewers focus?

Checklist

Uh oh!

matt001k Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

towynlin commented Mar 14, 2026

Uh oh!

matt001k commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants