Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
## 2024-06-30 - Optimize path-mining relation lookup
**Learning:** Found a performance bottleneck specific to this codebase's architecture in `src/tacet/distill/concepts.py` where `induce_relations` rebuilds forward adjacency structures repeatedly. By pre-computing these maps once, the complexity dropped dramatically from $O(|R|^2 \times \text{pairs})$ to $O(|R| \times \text{pairs})$. The benchmark showed an improvement from 32s to 26s for 20 dense relations.
**Action:** Always check for repeated graph traversal allocations or rebuilds inside nested loops when dealing with multi-relational graphs.
## 2024-07-26 - Optimize rule synthesis path-mining lookup
**Learning:** Found a similar performance bottleneck in `src/tacet/distill/distill.py`'s `mine_rules_with_stats`. It recomputes `_adj(_directed(idx[r2], inv2))` repeatedly inside a nested loop over relations for length-2 body generation. Precomputing `adj_maps` for all `(relation, inverted)` combinations outside the loop avoids O(N^2) dictionary recreation. Benchmark for 100 relations with 100 edges each improved from ~3.5s to ~2.2s.
**Action:** When performing path mining or pattern finding across KGs, precompute structural indices outside of combinatorial loops.
7 changes: 5 additions & 2 deletions src/tacet/distill/distill.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,14 +155,17 @@ def atom(rel: str, inv: bool, a: str, b: str) -> tuple[str, str, str]:
candidates.append(MinedRule(rule, conf, support))

# ---- length-2 body: R1(x,z) & R2(z,y) => target(x,y) ----------------
# Pre-compute adjacency maps to avoid redundant recreation in the nested loop
adj_maps = {(r, inv): _adj(_directed(idx[r], inv)) for r in relations for inv in (False, True)}

for r1 in relations:
for inv1 in (False, True):
p1 = _adj(_directed(idx[r1], inv1))
p1 = adj_maps[(r1, inv1)]
if not p1:
continue
for r2 in relations:
for inv2 in (False, True):
p2 = _adj(_directed(idx[r2], inv2))
p2 = adj_maps[(r2, inv2)]
if not p2:
continue
raw: set[Pair] = set()
Expand Down
Loading