Reference Layer RwLock Conflicts #354

owleyeview · 2025-12-10T01:15:46Z

owleyeview
Dec 10, 2025
Collaborator

I've chased down some unnoticed panic cases in the reference layer:

Summary of Panic Cause in Loader Tests

I’ve traced the panics to a re-entrant RwLock access that occurs whenever a single holon is both the source and target of a relationship (e.g., TypeDescriptor DescribedBy TypeDescriptor).

What’s happening

StagedReference and TransientReference both implement
add_related_holons and remove_related_holons by:
- taking a write lock on the underlying holon (Rc<RwLock<Holon>>), then
- delegating to the holon’s relationship logic.
Inside the relationship logic, HolonCollection::{add_references, remove_references} runs.
These functions maintain the collection’s key index by calling:
```
holon_ref.key(context)?
```
key(context) calls key_impl, which must:
- go back through the reference layer,
- obtain a read lock on the same holon (rc_holon.read()).
If the relationship is a self-edge (source == target), then:
- we already hold the write lock,
- and we now attempt to acquire a read lock on the same RwLock.
Rust’s standard RwLock is not re-entrant → this results in a panic.
In WASM this surfaces as a RuntimeError: unreachable, which Holochain wraps and the Sweetest test harness then unwraps → panic.

Impact

Any relationship where a holon references itself (DescribedBy, UsesKeyRule, etc.) triggers this panic during loader execution.
Both add/remove paths are affected.

How should we address this?

owleyeview · 2025-12-10T01:20:58Z

owleyeview
Dec 10, 2025
Collaborator Author

Here are some options that ChatGPT thinks we should consider 😁:

Option 1 — Precompute keys before locking the source holon (recommended)

Idea: Separate “compute keys” from “mutate relationships” so we don’t call key(context) while holding the source holon’s write lock.

Sketch:

In StagedReference::add_related_holons_impl (and helper paths like with_descriptor_impl):
- Before taking the write lock on the staged holon, compute:
```
Vec<(HolonReference, Option<MapString>)>
```
  by calling holon_ref.key(context) for each target.
Introduce parallel APIs:
- add_related_holons_with_keys → StagedRelationshipMap::add_related_holons_with_keys →
  HolonCollection::add_references_with_keys.
- These should use something like add_reference_with_key internally and never call key(context) themselves.
Inside the locked section, just use the precomputed keys; self-edges reuse the already-fetched key and never re-enter the lock.

Pros:

Keeps keyed_index behavior intact and consistent for all relationships.
Clean separation of concerns: key resolution happens outside the critical section.
Fixes both add/remove self-edge paths in a principled way.

Cons:

Requires adding new “with_keys” variants and a bit of mechanical call-site refactoring.

Option 2 — Special-case self-edges in the existing add path

Idea: Detect when the target is the same holon as the source and avoid calling key(context) in that case.

Sketch:

When adding relationships from a StagedReference, we already know the source’s identity (temporary id / local id).
Pass this identity (or the source key) down into the add pipeline:
- StagedReference::add_related_holons_impl →
  StagedHolon::add_related_holons →
  StagedRelationshipMap::add_related_holons →
  HolonCollection::add_references.
In HolonCollection::add_references, if the target HolonReference is the same staged holon as the source:
- Skip holon_ref.key(context) and instead reuse the source holon’s key (which we already have under the write lock).

Pros:

Smaller change footprint than Option 1.
Minimizes new API surface; mostly plumbing identity/key down to the collection layer.

Cons:

Slightly more ad-hoc: special behavior for self-edges only.
More identity plumbing across several layers; easier to get wrong than the precompute-keys pattern.

Option 3 — Skip `keyed_index` updates when key lookup would re-enter

Idea: Favor safety over indexing: if key lookup risks re-entry, don’t index that entry by key.

Sketch:

In HolonCollection::add_references:
- If holon_ref.key(context) would re-enter (e.g., we can detect source == target or a specific lock error), bail out of key lookup and:
  - Still push the member into members,
  - But do not update keyed_index for that entry.

Pros:

Easiest change to implement.
Avoids the re-entrant lock and stops the panic with minimal refactoring.

Cons:

keyed_index will be incomplete for self-edges:
- Lookups by key on that relationship might miss the self-edge.
Behavior becomes subtle: “usually indexed, except for some edges.”

Option 4 — Rework the locking strategy

Idea: Change the lock order so we never hold the source’s write lock while also needing to lock the target for key lookup.

Sketch (high-level):

Instead of:
- lock source holon (write) → call into relationship map → HolonCollection::add_references → holon_ref.key(context) re-locks the same holon
Move toward:
- Lock relationship collection first,
- Take short-lived read locks on target holons to fetch keys,
- Then mutate the source holon in a way that avoids re-entering the same RwLock.

Pros:

Conceptually “pure”: we avoid the re-entrancy by design.
Might simplify some invariants long-term.

Cons:

Most intrusive and risky:
- Changes lock ordering and may introduce deadlock hazards if not done very carefully.
Touches core holon/relationship mutation flows; higher regression risk.

Suggested Direction

If we want a robust fix that preserves current behavior and minimizes surprises, Option 1 (precompute keys before locking) is the strongest candidate:

It keeps keyed_index fully functional,
Avoids re-entry cleanly,
And the required API changes are mechanical and auditable.

0 replies

evomimic · 2025-12-10T18:48:16Z

evomimic
Dec 10, 2025
Maintainer

Thanks for sleuthing this out and surfacing your findings as a Discussion @owleyeview ! This uncovers some important design considerations.

Of the options ChatGPT suggested, two and three are non-starters due to lost functionality. Option 1 is viable and 4 is perhaps not as risky as advertised given that we recently moved lock acquisition responsibility inside the implementation of the reference layer, instead of leaving it with callers on the reference layer. So we have clear line of sight to the scope of any given operation and never hold locks outside the scope of any single operation. This helps!

That said, I think the simplest and most effective fix is to just reduce lock granularity for add_related_holons.

Note that we don't actually need a write lock on the holon when adding/removing related holons from one of the relationships in its relationship map.

StagedRelationshipMap (and TransientRelationshipMap) are already wrapping the target HolonCollection in Arc<RwLock<_>> and and acquire the write-lock on the collection, not on the source holon.

pub struct StagedRelationshipMap {
    pub map: BTreeMap<RelationshipName, Arc<RwLock<HolonCollection>>>,
}

...and locking the collection (not the source holon) when adding references.

impl WritableRelationship for StagedRelationshipMap {
    /// Adds holon references to a staged relationship, creating the collection if needed.
    fn add_related_holons(
        &mut self,
        context: &dyn HolonsContextBehavior,
        relationship_name: RelationshipName,
        holons: Vec<HolonReference>,
    ) -> Result<(), HolonError> {
        let lock = self
            .map
            .entry(relationship_name)
            .or_insert_with(|| Arc::new(RwLock::new(HolonCollection::new_staged())));
        lock.write()
            .map_err(|e| {
                HolonError::FailedToAcquireLock(format!(
                    "Failed to acquire write lock on holon collection: {}",
                    e
                ))
            })?
            .add_references(context, holons)?;
        Ok(())
    }

So StagedReference and TransientReference implementation of add_related_holons should only be getting a read lock, NOT a write lock on the rc_holon:

impl WritableHolonImpl for StagedReference {
    fn add_related_holons_impl(
        &mut self,
        context: &dyn HolonsContextBehavior,
        relationship_name: RelationshipName,
        holons: Vec<HolonReference>,
    ) -> Result<&mut Self, HolonError> {
        self.is_accessible(context, AccessType::Write)?;
        let rc_holon = self.get_rc_holon(context)?;
        let mut holon_mut = rc_holon.write().unwrap(); // <--- just get a read lock here!!!
        holon_mut.add_related_holons(context, relationship_name, holons)?;

        Ok(self)
    }

Recommendation: Option 5

Change the implementation of add_related_holons in StagedReference and TransientReference to only acquire a read lock instead of write lock on their rc_holon.

1 reply

owleyeview Dec 10, 2025
Collaborator Author

I want to double-check my understanding of your recommendation around add_related_holons.

Right now, the panic is caused by the self-edge case where we:

Take a write lock on the source holon’s Arc<RwLock<Holon>> in StagedReference::add_related_holons_impl / TransientReference::add_related_holons_impl.
Call holon_mut.add_related_holons(...), which eventually calls HolonCollection::add_references.
HolonCollection::add_references calls holon_ref.key(context).
For a self-edge, that holon_ref points back to the same holon, so key_impl tries to read-lock the same RwLock<Holon> while the write guard is held → re-entrant lock → WASM panic.

Conceptually, you’re saying: “we shouldn’t need a write lock on the whole holon just to mutate a relationship collection; the per-relationship HolonCollection is already behind an Arc<RwLock<_>>, so we should treat relationships as internally mutable and only lock those collections.”

I want to confirm that you’re suggesting we actually make that architectural change now, rather than taking a smaller / local fix (like precomputing keys before locking the holon).

If we do go that route, here’s how I’m picturing the concrete changes:

Make relationship maps use interior mutability for the map itself

Today TransientRelationshipMap / StagedRelationshipMap look like:
```
pub struct TransientRelationshipMap {
    pub map: BTreeMap<RelationshipName, Arc<RwLock<HolonCollection>>>,
}
```
and their add_related_holons takes &mut self.

To support holding only a read lock on the holon, we’d need relationship mutation to not require &mut Holon. That implies something like:
- Either wrapping the entire map in its own RwLock (e.g. RwLock<BTreeMap<...>>), or
- Otherwise re-designing so that the APIs for adding/removing relationships only need &self and use internal locks.
Change relationship mutation APIs to operate on &self

For example, TransientRelationshipMap::add_related_holons / StagedRelationshipMap::add_related_holons would change from:
```
fn add_related_holons(&mut self, ...) -> Result<(), HolonError>
```
to something like:
```
fn add_related_holons(&self, ...) -> Result<(), HolonError>
```
and do all mutation via internal RwLock guards on the map / collections.
Adjust WritableHolonState to match

The WritableHolonState impls for TransientHolon / StagedHolon currently take &mut self and mutate *_relationships directly.

Under the “relationships are internally mutable” model, add_related_holons on the holon could still take &mut self for API symmetry if we want, but in principle it could also be implemented in a way that:
- Uses is_accessible(AccessType::Write) to enforce write permission, but
- Only needs &self to delegate to the internally-mutable relationship map.
The important part is: callers like StagedReference::add_related_holons_impl no longer need a write lock on Arc<RwLock<Holon>> to mutate relationships—they could hold a read lock and rely on interior mutability of the relationship structures.
Update StagedReference / TransientReference to use only a read lock

Once the above refactor is in place, we could safely change:
```
let rc_holon = self.get_rc_holon(context)?;
let mut holon_mut = rc_holon.write()?;
holon_mut.add_related_holons(...)
```
to something like:
```
self.is_accessible(context, AccessType::Write)?;
let rc_holon = self.get_rc_holon(context)?;
let holon = rc_holon.read()?;
holon.add_related_holons(...) // now only needs &self and uses inner locks
```
At that point, the self-edge case no longer tries to re-enter the same RwLock<Holon>, because the outer lock is read-only and all actual mutation is pushed into the inner HolonCollection / relationship map locks.

Can you confirm that this is the direction you’d like us to take now (refactor relationship maps to be fully internally mutable and shift relationship mutation under read-locks on the holon)? This seems like a bigger refactor than pre-computing keys for now, but maybe its the right way to go.

evomimic · 2025-12-10T23:58:43Z

evomimic
Dec 10, 2025
Maintainer

Ah... I'd forgotten that add_related_holons is not just used to add targets to an existing relationship, but also can add new entries to the relationship map's BTreeMap. So, yes, we'll need to have the entire map support interior mutability. This makes the fix substantially larger, but still architecturally the correct move.

Option 5 would have to change the definitions of StagedRelationshipMap and TransientRelationshipMap to something like this:

use std::sync::RwLock;

pub struct StagedRelationshipMap {
    pub map: RwLock<BTreeMap<RelationshipName, Arc<RwLock<HolonCollection>>>>,
}

Then we'd do a 2-step locking protocol for add_related_holons (for both staged and transient holons).

impl WritableRelationship for StagedRelationshipMap {
    fn add_related_holons(
        &self, // note: no longer &mut self
        context: &dyn HolonsContextBehavior,
        relationship_name: RelationshipName,
        holons: Vec<HolonReference>,
    ) -> Result<(), HolonError> {
        use std::collections::btree_map::Entry;

        // Step 1: Try read-locking the map to see if the collection exists
        if let Some(arc_collection) = self.map.read()
            .map_err(|e| HolonError::FailedToAcquireLock(format!(
                "Failed to read-lock relationship map: {}", e
            )))?
            .get(&relationship_name)
            .cloned()
        {
            // Relationship already exists — mutate the collection directly
            arc_collection.write()
                .map_err(|e| HolonError::FailedToAcquireLock(format!(
                    "Failed to acquire write lock on holon collection: {}", e
                )))?
                .add_references(context, holons)?;
            return Ok(());
        }

        // Step 2: If not found, acquire write-lock to insert new entry
        let mut map = self.map.write()
            .map_err(|e| HolonError::FailedToAcquireLock(format!(
                "Failed to write-lock relationship map: {}", e
            )))?;

        let arc_collection = Arc::new(RwLock::new(HolonCollection::new_staged()));
        arc_collection.write()
            .map_err(|e| HolonError::FailedToAcquireLock(format!(
                "Failed to lock new holon collection: {}", e
            )))?
            .add_references(context, holons)?;

        map.insert(relationship_name, arc_collection);

        Ok(())
    }
}

But the structural change to the two relationship maps has a pretty large ripple effect on both staged.rs and transient.rs.

Even worse, the change to interior mutability will require a change to the WritableHolon trait for add_related_holons and remove_related_holons. This trait is key part of the public API, so the ripple effect of this is unacceptably large to tackle in Issue #333.

Recommendation

I agree the best path forward is Option 1.

0 replies

owleyeview · 2025-12-11T04:55:12Z

owleyeview
Dec 11, 2025
Collaborator Author

Update

Option 1 (Precomputing keys before acquiring the lock and then passing them down the api chain) has been implemented.

Added keyed variants to the relationship APIs (WritableHolonState and WritableRelationship) so callers can pass (HolonReference, Option) without needing context inside the lock.
Extended HolonCollectionApi with keyed add/remove helpers that consume precomputed keys and rebuild the keyed index without calling back into key(context).
Updated staged/transient holon/reference implementations to precompute keys before taking the holon write lock and to use the keyed relationship path end-to-end.

Edit: I also created issue #355 to remind us to come back to this at some point.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference Layer RwLock Conflicts #354

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Reference Layer RwLock Conflicts #354

Uh oh!

Uh oh!

owleyeview Dec 10, 2025 Collaborator

Summary of Panic Cause in Loader Tests

What’s happening

Impact

Replies: 4 comments · 1 reply

Uh oh!

owleyeview Dec 10, 2025 Collaborator Author

Option 1 — Precompute keys before locking the source holon (recommended)

Option 2 — Special-case self-edges in the existing add path

Option 3 — Skip keyed_index updates when key lookup would re-enter

Option 4 — Rework the locking strategy

Suggested Direction

Uh oh!

evomimic Dec 10, 2025 Maintainer

Recommendation: Option 5

Uh oh!

Uh oh!

owleyeview Dec 10, 2025 Collaborator Author

Uh oh!

evomimic Dec 10, 2025 Maintainer

Recommendation

Uh oh!

Uh oh!

owleyeview Dec 11, 2025 Collaborator Author

Update

owleyeview
Dec 10, 2025
Collaborator

Replies: 4 comments 1 reply

owleyeview
Dec 10, 2025
Collaborator Author

Option 3 — Skip `keyed_index` updates when key lookup would re-enter

evomimic
Dec 10, 2025
Maintainer

owleyeview Dec 10, 2025
Collaborator Author

evomimic
Dec 10, 2025
Maintainer

owleyeview
Dec 11, 2025
Collaborator Author