Skip to content

Commit 530e146

Browse files
Roo Coderuvnet
andcommitted
Add PRD: Semantic Data Pipeline Remediation — wire existing components together
Comprehensive PRD based on 3-agent audit of the full markdown→Neo4j→GPU→client pipeline. Identifies 7 data loss points where existing, working code is disconnected. All components exist (parsers, actors, CUDA kernels, constraint translators, binary protocol fields) but aren't wired together. Key finding: 8 of 9 relationship types from markdown OntologyBlocks are lost. 110K OWL axiom nodes sit isolated in Neo4j. SemanticForcesActor, OntologyConstraintActor, and semantic_forces.cu are spawned but never receive data. 4-phase remediation: data integrity → GPU pipeline → semantic forces → client. ~410 lines across ~15 files. No new systems — just connecting existing ones. Co-Authored-By: claude-flow <ruv@ruv.net>
1 parent 7491020 commit 530e146

1 file changed

Lines changed: 179 additions & 0 deletions

File tree

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# PRD: Semantic Data Pipeline Remediation
2+
3+
**Date**: 2026-03-25
4+
**Status**: APPROVED FOR EXECUTION
5+
**Priority**: P0 — Last major refactor
6+
**Principle**: Markdown is the source of truth. Neo4j is speed middleware. The GPU visualises the semantic structure that already exists in the data.
7+
8+
---
9+
10+
## 1. Problem Statement
11+
12+
The Logseq markdown files contain rich semantic relationships (9+ types: is-subclass-of, has-part, requires, enables, depends-on, relates-to, bridges-to/from, explicit_link, namespace). The parsing pipeline extracts all of them. But **only wikilinks survive** to the client graph — 8 of 9 relationship types are lost at various pipeline stages, and 110,209 OWL axiom nodes sit isolated in Neo4j.
13+
14+
**Quantitative gap**: 490 edges reach the client from a dataset containing ~2,600+ potential relationships (980 EDGE + 623 SUBCLASS_OF + ~1,000 from axiom materialisation).
15+
16+
---
17+
18+
## 2. Root Cause Analysis (from 3-agent audit)
19+
20+
### 7 Data Loss Points
21+
22+
| # | Stage | What's Lost | Root Cause | Severity |
23+
|---|-------|-------------|------------|----------|
24+
| **DL1** | OwlClass storage | 8/9 relationship types | `add_owl_class()` only stores `parent_classes` as SUBCLASS_OF. has-part, requires, enables etc. dropped | **CRITICAL** |
25+
| **DL2** | Graph load (OwlClass path) | Non-hierarchical edges | `load_graph()` only queries SUBCLASS_OF between OwlClasses | HIGH |
26+
| **DL3** | CSR construction | Edge type metadata | ForceComputeActor flattens edges to `(target, weight)`, discarding edge_type | HIGH |
27+
| **DL4** | GPU analytics → AppState | cluster_id, anomaly_score, community_id | ClusteringActor/AnomalyDetectionActor compute but never write to `app_state.node_analytics` | HIGH |
28+
| **DL5** | Binary protocol | Analytics fields | Wire format V3 has fields but TypeScript never reads them | MEDIUM |
29+
| **DL6** | OwlAxiom → edges | 110K axioms | Stored as isolated nodes, never materialised as graph edges | HIGH |
30+
| **DL7** | Constraint pipeline | OWL axiom → physics forces | OntologyConstraintTranslator exists but `apply_ontology_constraints()` never called | MEDIUM |
31+
32+
### Existing Code That Works (But Is Disconnected)
33+
34+
| Component | File | Status |
35+
|-----------|------|--------|
36+
| OntologyParser extracts 9+ relationship types | `parsers/ontology_parser.rs:354-397` | **Working** |
37+
| GitHubSyncService creates typed edges with OWL IRIs | `github_sync_service.rs:382-490` | **Working** |
38+
| GraphNode EDGE relationships in Neo4j | `neo4j_adapter.rs:582` | **Working** |
39+
| SemanticForcesActor (DAG, type clustering, collision) | `gpu/semantic_forces_actor.rs` | **Spawned, no data** |
40+
| OntologyConstraintActor (axiom → forces) | `gpu/ontology_constraint_actor.rs` | **Spawned, no data** |
41+
| OntologyConstraintTranslator (5 constraint types) | `physics/ontology_constraints.rs` | **Implemented, never called** |
42+
| WhelkInferenceEngine (transitive closure) | `adapters/whelk_inference_engine.rs` | **Working, output unused** |
43+
| semantic_forces.cu CUDA kernel | `utils/semantic_forces.cu` | **Compiled, never invoked** |
44+
| Binary protocol V3 analytics fields | `utils/binary_protocol.rs:40-50` | **Declared, always zero** |
45+
| ClusterHulls component | `graph/components/ClusterHulls.tsx` | **Renders, no cluster data** |
46+
47+
---
48+
49+
## 3. Design Principle
50+
51+
**Do not create new systems. Wire the existing ones together.**
52+
53+
The architecture is sound. Every component exists. The problem is 7 broken wires between them.
54+
55+
---
56+
57+
## 4. Remediation Plan
58+
59+
### Phase 1: Data Integrity (Fix DL1, DL2, DL6)
60+
*Goal: All markdown relationships reach Neo4j as edges*
61+
62+
#### 1.1 Store ALL relationship types as Neo4j edges
63+
**File**: `src/adapters/neo4j_ontology_repository.rs``add_owl_class()`
64+
**Change**: After storing SUBCLASS_OF for parent_classes, also store:
65+
- `has_part``:RELATES {relationship_type: "has_part", owl_property_iri: "mv:hasPart"}`
66+
- `requires``:RELATES {relationship_type: "requires"}`
67+
- `depends_on`, `enables`, `relates_to`, `bridges_to`, `bridges_from`
68+
**Impact**: ~500+ new Neo4j edges from existing parsed data
69+
70+
#### 1.2 Materialise SubClassOf axioms as SUBCLASS_OF edges
71+
**File**: `src/adapters/neo4j_ontology_repository.rs`
72+
**Change**: After whelk reasoning, run:
73+
```cypher
74+
MATCH (a:OwlAxiom {axiom_type: "SubClassOf"})
75+
MATCH (s:OwlClass {iri: a.subject})
76+
MATCH (o:OwlClass {iri: a.object})
77+
MERGE (s)-[r:SUBCLASS_OF {is_inferred: true}]->(o)
78+
```
79+
**Impact**: Transitive closure edges from 110K axioms
80+
81+
#### 1.3 Load ALL relationship types in load_graph()
82+
**File**: `src/adapters/neo4j_adapter.rs``load_graph()`
83+
**Change**: After loading EDGE relationships, also query:
84+
```cypher
85+
MATCH (s)-[r:RELATES|SUBCLASS_OF]->(t) WHERE ...
86+
```
87+
Map to Edge objects with appropriate edge_type and weight.
88+
89+
### Phase 2: GPU Pipeline (Fix DL3, DL4)
90+
*Goal: Edge types and analytics reach the GPU and flow back*
91+
92+
#### 2.1 Extend CSR with edge type buffer
93+
**File**: `src/utils/unified_gpu_compute/construction.rs`
94+
**Change**: Add `edge_types: DeviceBuffer<u8>` parallel to `edge_col_indices`. Upload edge type enum (0=explicit, 1=subclass, 2=structural, 3=dependency, 4=associative, 5=bridge).
95+
**Impact**: `semantic_forces.cu` can read edge types for weighted springs
96+
97+
#### 2.2 Wire ClusteringActor → app_state.node_analytics
98+
**File**: `src/actors/gpu/clustering_actor.rs`
99+
**Change**: After computing cluster assignments, send results to `ClientCoordinatorActor` or write directly to `app_state.node_analytics`.
100+
**Impact**: Binary protocol V3 carries real cluster_id/anomaly_score
101+
102+
### Phase 3: Semantic Forces Activation (Fix DL7)
103+
*Goal: Existing CUDA kernels compute forces from semantic structure*
104+
105+
#### 3.1 Feed OntologyConstraintActor with axiom data
106+
**File**: `src/actors/gpu/ontology_constraint_actor.rs`
107+
**Change**: On graph reload, query OwlAxioms from Neo4j, run through `OntologyConstraintTranslator`, upload constraint buffer to GPU.
108+
**Impact**: DisjointClasses push apart, SubClassOf clusters together, SameAs merges
109+
110+
#### 3.2 Activate SemanticForcesActor type clustering
111+
**File**: `src/actors/gpu/semantic_forces_actor.rs`
112+
**Change**: Forward `source_domain` from node metadata as `type_id` to the GPU kernel. Configure `TypeClusterConfig` with per-domain centroids.
113+
**Impact**: Nodes cluster by domain (AI/BC/MV/RB) in 3D space
114+
115+
### Phase 4: Client Integration (Fix DL5)
116+
*Goal: Client renders semantic structure visually*
117+
118+
#### 4.1 Parse V3 analytics fields in TypeScript
119+
**File**: `client/src/types/binaryProtocol.ts`
120+
**Change**: Expose `cluster_id`, `anomaly_score`, `community_id` in `BinaryNodeData`.
121+
122+
#### 4.2 Colour nodes by cluster, edges by type
123+
**File**: `client/src/features/graph/components/GraphManager.tsx`
124+
**Change**: Use `cluster_id` for node colouring, `edge_type` for edge colour/width.
125+
126+
---
127+
128+
## 5. Execution Order
129+
130+
```
131+
Phase 1.1 → Phase 1.3 → Phase 1.2 → rebuild → verify edge counts
132+
Phase 2.1 → Phase 2.2 → rebuild → verify analytics flow
133+
Phase 3.1 → Phase 3.2 → rebuild → verify spatial clustering
134+
Phase 4.1 → Phase 4.2 → verify visual output
135+
```
136+
137+
Each phase is independently testable. Each build verifies the previous phase works before adding the next.
138+
139+
---
140+
141+
## 6. Success Criteria
142+
143+
| Metric | Current | Target |
144+
|--------|---------|--------|
145+
| Client edges | 490 | 1,500+ |
146+
| Node isolation | 62% | <15% |
147+
| Edge types in graph | 1 (explicit_link) | 9+ |
148+
| GPU cluster_id populated | 0% | 100% |
149+
| Spatial domain clustering | None | Visible BC/AI/MV/RB groups |
150+
| Ontology constraints active | 0 | SubClassOf + DisjointWith |
151+
| Cluster hulls meaningful | 1 blob | 4-6 distinct domain hulls |
152+
153+
---
154+
155+
## 7. Files Modified (Estimated)
156+
157+
| Phase | Files | Lines Changed |
158+
|-------|-------|---------------|
159+
| 1.1 | neo4j_ontology_repository.rs | ~50 |
160+
| 1.2 | neo4j_ontology_repository.rs | ~30 |
161+
| 1.3 | neo4j_adapter.rs | ~40 |
162+
| 2.1 | construction.rs, execution.rs, memory.rs | ~80 |
163+
| 2.2 | clustering_actor.rs, app_state.rs | ~40 |
164+
| 3.1 | ontology_constraint_actor.rs, graph_state_actor.rs | ~60 |
165+
| 3.2 | semantic_forces_actor.rs, settings propagation | ~40 |
166+
| 4.1 | binaryProtocol.ts, graph.worker.ts | ~30 |
167+
| 4.2 | GraphManager.tsx, ClusterHulls.tsx | ~40 |
168+
| **Total** | **~15 files** | **~410 lines** |
169+
170+
---
171+
172+
## 8. Risk Assessment
173+
174+
- **Low risk**: Phases 1.x are additive (more edges, no removal)
175+
- **Medium risk**: Phase 2.1 (CSR extension) touches GPU memory layout
176+
- **Low risk**: Phase 3.x activates existing code paths
177+
- **Low risk**: Phase 4.x is client-only changes
178+
179+
No destructive changes. Each phase adds capability without removing existing functionality.

0 commit comments

Comments
 (0)