Releases: ddse-foundation/cef
Release Notes - 0.6
Release Notes
Version 0.6 (Research) — December 7, 2025
Research Edition - Production Patterns Implemented, Not Hardened
This release transforms CEF from a single-backend prototype to a multi-backend framework with production patterns. While still research-grade, v0.6 implements foundational resilience, security, and pluggability that can evolve toward production readiness.
🎯 Release Highlights
- 5 Graph Store Backends - Neo4j, PostgreSQL AGE, PostgreSQL SQL, DuckDB, In-Memory
- 4 Vector Store Backends - Neo4j, PostgreSQL pgvector, DuckDB VSS, In-Memory
- Resilience Patterns - Retry, circuit breaker, timeout for embedding services
- Security Foundations - API-key auth, input sanitization, audit logging
- 178+ Integration Tests - Real infrastructure via Testcontainers (no mocks)
- Docker Compose - Neo4j, PostgreSQL+pgvector, Apache AGE, MinIO
✨ New Features
Pluggable Graph Stores (IDR-004)
| Store | Config Value | Backend | Tests |
|---|---|---|---|
| Neo4jGraphStore | neo4j |
Neo4j 5.x Community | 18 tests |
| PgAgeGraphStore | pg-age |
PostgreSQL + Apache AGE | 18 tests |
| PgSqlGraphStore | pg-sql |
Pure PostgreSQL SQL | 18 tests |
| DuckDbGraphStore | duckdb |
DuckDB embedded | Default |
| InMemoryGraphStore | in-memory |
JGraphT | Development |
Pluggable Vector Stores
| Store | Config Value | Backend | Notes |
|---|---|---|---|
| Neo4jChunkStore | neo4j |
Neo4j vector indexes | Unified with Neo4j graph |
| R2dbcChunkStore | postgresql |
PostgreSQL pgvector | Reactive R2DBC |
| DuckDbChunkStore | duckdb |
DuckDB VSS | Default |
| InMemoryChunkStore | in-memory |
ConcurrentHashMap | Development |
Dual-Store Configuration
Graph and vector stores are independently configurable:
cef:
graph:
store: neo4j # neo4j | pg-age | pg-sql | duckdb | in-memory
vector:
store: neo4j # neo4j | postgresql | duckdb | in-memoryTested Backend Combinations
| Profile | Graph Store | Vector Store | Use Case |
|---|---|---|---|
| in-memory | in-memory |
in-memory |
Development, CI/CD |
| duckdb | duckdb |
duckdb |
Default, embedded |
| neo4j | neo4j |
neo4j |
Production graphs |
| pg-sql | pg-sql |
postgresql |
Max PostgreSQL compatibility |
| pg-age | pg-age |
postgresql |
Cypher on PostgreSQL |
Resilience Infrastructure
CefResilienceProperties.java- Externalized configurationCefResilienceAutoConfiguration.java- Auto-configuration for Resilience4jResilientEmbeddingService.java- Wrapper with retry, circuit breaker, timeout
cef:
resilience:
embedding:
retry:
max-attempts: 3
wait-duration: 1s
circuit-breaker:
failure-rate-threshold: 50
timeout: 30sThread Safety
ThreadSafeKnowledgeGraph.java- ReadWriteLock wrapper for InMemoryKnowledgeGraph- 21 concurrent tests including stress tests
- Opt-in via
cef.graph.thread-safe=true
Security Foundations
CefSecurityProperties.java- Security configuration (JWT, API-Key, OAuth2)InputSanitizer.java- SQL/Cypher injection, XSS, prompt injection preventionSecurityAuditLogger.java- Audit logging for security eventsCefExceptionHandler.java- Sanitized error responses- 49 tests for security components
Input Validation
ValidatedRetrievalRequest.java- JSR-380 validated retrieval DTOValidatedNodeInput.java- JSR-380 validated node input DTOValidatedEdgeInput.java- JSR-380 validated edge input DTO- 29 validation tests
Configuration Hardening
- Enhanced
CefProperties.javawith JSR-380 validation constraints - Validation for: dimension (128-4096), token budget (100-128K), batch size, cache TTL
- 18 validation tests
Observability
CefHealthIndicator.java- Combined health indicator for CEF componentsKnowledgeGraphHealthIndicator.java- Graph-specific health checksCefMetrics.java- Micrometer metrics binder for graph statistics
🐳 Docker Compose Updates
New services for v0.6:
# Neo4j (Graph Store)
docker-compose up -d neo4j
# Access: http://localhost:7474 (neo4j/cef_password)
# PostgreSQL + AGE (Graph Store)
docker-compose --profile age up -d postgres-age
# Access: localhost:5433
# Full stack
docker-compose --profile age --profile minio up -d⚠️ Known Limitations (Research Edition)
See KNOWN_ISSUES.md for complete list:
- Security defaults OFF - Must opt-in via
cef.security.enabled=true - PgAGE query safety - Manual Cypher escaping, needs parameterization
- Resilience coverage - Only embeddings have retry/CB/timeout
- Observability gaps - No health indicators for Neo4j/Pg stores
📊 Test Coverage
| Category | Tests | Notes |
|---|---|---|
| Neo4j Integration | 18 | Testcontainers |
| PostgreSQL AGE | 18 | Testcontainers |
| PostgreSQL SQL | 18 | Testcontainers |
| Security | 49 | InputSanitizer, AuditLogger |
| Validation | 29 | JSR-380 DTOs |
| Thread Safety | 21 | Concurrent stress tests |
| Resilience | 7 | Real Ollama |
| Configuration | 18 | CefProperties validation |
| Total New | 178+ | All passing |
📚 Documentation Updates
- Updated README.md with v0.6 features
- Updated USER_GUIDE.md with graph store selection
- Updated QUICKSTART.md with new Docker Compose options
- Updated ARCHITECTURE.md with storage architecture
- Added ddse/v0.6/IDR-004.md (Implementation Decision Record)
Version beta-0.5 (November 27, 2025)
First Public Beta Release
This is the initial beta release of the Context Engineering Framework (CEF) from DDSE Foundation. CEF provides an ORM-like abstraction for LLM context engineering, managing knowledge models through dual persistence (graph + vector stores).
🎯 Release Highlights
- ORM for Context Engineering - Define knowledge models (nodes, edges) like JPA entities
- Dual Persistence - Automatic management of graph and vector stores
- Intelligent Context Assembly - 3-level strategy (relationship navigation → semantic → keyword)
- Standard Patterns - Repository layer, service patterns, lifecycle hooks
- Comprehensive Documentation - USER_GUIDE, ARCHITECTURE, examples
Note: This is a Community/Research Release. It is optimized for ease of use and experimentation, not for high-concurrency enterprise production environments. It is ideal for "Disposable Research Pods"—ephemeral environments spun up to analyze specific datasets (e.g., financial audits, clinical trials) and then discarded.
✨ Core Features
Knowledge Model ORM
- Entity Persistence - Node and Edge entities with JSONB properties
- Relationship Navigation - Multi-hop graph traversal with semantic filtering
- Vectorizable Content - Automatic embedding generation and persistence
- RelationType System - Semantic hints (HIERARCHY, CAUSALITY, ASSOCIATION, etc.)
Storage Backends (Pluggable)
- ✅ DuckDB - Embedded database (default, tested)
- ✅ JGraphT - In-memory graph store (default, tested)
⚠️ PostgreSQL - External database with pgvector (configured, untested)⚠️ Neo4j - Graph database for large-scale deployments (configured, untested)⚠️ Qdrant - Vector database (configured, untested)⚠️ Pinecone - Cloud vector database (configured, untested)
LLM Integration
- ✅ vLLM - Production inference server with Qwen3-Coder-30B-A3B-Instruct-FP8 (tested)
- ✅ Ollama Embeddings - nomic-embed-text model, 768 dimensions (tested)
⚠️ OpenAI - GPT-4, GPT-3.5 Turbo (configured, untested)⚠️ Ollama LLM - Llama 3.x models (configured, untested)
Context Assembly
- Pattern-Based Retrieval - GraphPattern with TraversalStep and Constraint
- Multi-Hop Reasoning - Configurable depth (1-5 hops)
- 3-Level Fallback - Graph → Hybrid → Vector-only
- Semantic Filtering - Relationship semantics-aware traversal
Developer Experience
- Repository Pattern - Domain-specific facades over ORM layer
- Service Layer - Business logic separation with transaction support
- Reactive API - Spring WebFlux + R2DBC for non-blocking I/O
- Configuration - YAML-based with sensible defaults
📦 What's Included
Framework (cef-framework)
<dependency>
<groupId>org.ddse.ml</groupId>
<artifactId>cef-framework</artifactId>
<version>beta-0.5</version>
</dependency>KnowledgeIndexer- Entity persistence (like EntityManager)KnowledgeRetriever- Context queries (like Repository)GraphStore- Pluggable graph backend interfaceVectorStore- Pluggable vector backend interfaceNode,Edge,Chunk,RelationType- Core domain entitiesGraphPattern,TraversalStep,Constraint- Query DSL
Comprehensive Test Suite
- Medical Domain: 150 patients, 5 conditions, 7 medications, 15 doctors (177 nodes, 455 edges)
- Financial Domain: SAP-simulated data (vendors, materials, purchase orders, invoices)
- Benchmarks: 4 complex scenarios proving Knowledge Model superiority
- Results: 60-220% improvement over vector-only search (see docs/EVALUATION_SUMMARY.md)
Documentation
- USER_GUIDE.md - Complete ORM integration guide (30KB, 1,200 lines)
- ARCHITECTURE.md - Technical deep dive
- QUICKSTART.md - Getting started in 5 minutes
- KNOWN_ISSUES.md - Testing status and limitations
- README.md - Project overview
##...
Release Notes - beta-0.5
Release Notes
Version beta-0.5 (November 27, 2025)
First Public Beta Release
This is the initial beta release of the Context Engineering Framework (CEF) from DDSE Foundation. CEF provides an ORM-like abstraction for LLM context engineering, managing knowledge models through dual persistence (graph + vector stores).
🎯 Release Highlights
- ORM for Context Engineering - Define knowledge models (nodes, edges) like JPA entities
- Dual Persistence - Automatic management of graph and vector stores
- Intelligent Context Assembly - 3-level strategy (relationship navigation → semantic → keyword)
- Standard Patterns - Repository layer, service patterns, lifecycle hooks
- Comprehensive Documentation - USER_GUIDE, ARCHITECTURE, examples
Note: This is a Community/Research Release. It is optimized for ease of use and experimentation, not for high-concurrency enterprise production environments. It is ideal for "Disposable Research Pods"—ephemeral environments spun up to analyze specific datasets (e.g., financial audits, clinical trials) and then discarded.
✨ Core Features
Knowledge Model ORM
- Entity Persistence - Node and Edge entities with JSONB properties
- Relationship Navigation - Multi-hop graph traversal with semantic filtering
- Vectorizable Content - Automatic embedding generation and persistence
- RelationType System - Semantic hints (HIERARCHY, CAUSALITY, ASSOCIATION, etc.)
Storage Backends (Pluggable)
- ✅ DuckDB - Embedded database (default, tested)
- ✅ JGraphT - In-memory graph store (default, tested)
⚠️ PostgreSQL - External database with pgvector (configured, untested)⚠️ Neo4j - Graph database for large-scale deployments (configured, untested)⚠️ Qdrant - Vector database (configured, untested)⚠️ Pinecone - Cloud vector database (configured, untested)
LLM Integration
- ✅ vLLM - Production inference server with Qwen3-Coder-30B-A3B-Instruct-FP8 (tested)
- ✅ Ollama Embeddings - nomic-embed-text model, 768 dimensions (tested)
⚠️ OpenAI - GPT-4, GPT-3.5 Turbo (configured, untested)⚠️ Ollama LLM - Llama 3.x models (configured, untested)
Context Assembly
- Pattern-Based Retrieval - GraphPattern with TraversalStep and Constraint
- Multi-Hop Reasoning - Configurable depth (1-5 hops)
- 3-Level Fallback - Graph → Hybrid → Vector-only
- Semantic Filtering - Relationship semantics-aware traversal
Developer Experience
- Repository Pattern - Domain-specific facades over ORM layer
- Service Layer - Business logic separation with transaction support
- Reactive API - Spring WebFlux + R2DBC for non-blocking I/O
- Configuration - YAML-based with sensible defaults
📦 What's Included
Framework (cef-framework)
<dependency>
<groupId>org.ddse.ml</groupId>
<artifactId>cef-framework</artifactId>
<version>beta-0.5</version>
</dependency>KnowledgeIndexer- Entity persistence (like EntityManager)KnowledgeRetriever- Context queries (like Repository)GraphStore- Pluggable graph backend interfaceVectorStore- Pluggable vector backend interfaceNode,Edge,Chunk,RelationType- Core domain entitiesGraphPattern,TraversalStep,Constraint- Query DSL
Comprehensive Test Suite
- Medical Domain: 150 patients, 5 conditions, 7 medications, 15 doctors (177 nodes, 455 edges)
- Financial Domain: SAP-simulated data (vendors, materials, purchase orders, invoices)
- Benchmarks: 4 complex scenarios proving Knowledge Model superiority
- Results: 60-220% improvement over vector-only search (see docs/EVALUATION_SUMMARY.md)
Documentation
- USER_GUIDE.md - Complete ORM integration guide (30KB, 1,200 lines)
- ARCHITECTURE.md - Technical deep dive
- QUICKSTART.md - Getting started in 5 minutes
- KNOWN_ISSUES.md - Testing status and limitations
- README.md - Project overview
🧪 Testing Status
Thoroughly Tested ✅
- DuckDB embedded database
- JGraphT in-memory graph (up to 100K nodes)
- vLLM with Qwen3-Coder-30B-A3B-Instruct-FP8
- Ollama embeddings (nomic-embed-text, 768 dimensions)
- Pattern-based retrieval with multi-hop reasoning
- Medical domain example with benchmarks
Configured but Untested ⚠️
- PostgreSQL + pgvector
- Neo4j graph database
- OpenAI GPT models
- Ollama LLM models (Llama 3.x)
- Qdrant vector database
- Pinecone vector database
See KNOWN_ISSUES.md for details.
🚀 Getting Started
Prerequisites
- Java 17+
- Maven 3.8+
- Docker & Docker Compose
Quick Start
# Clone repository
git clone <repository-url>
cd ced
# Start services (Ollama for embeddings)
docker-compose up -d
# Note: vLLM (Qwen3-Coder-30B) required for benchmark reproduction
# See https://docs.vllm.ai/ for installation
# Build framework
mvn clean install
# Run test suite (includes benchmarks)
cd cef-framework
mvn test
# View benchmark results
open docs/benchmark_comparison.png
open docs/EVALUATION_SUMMARY.mdExample Usage
// Define knowledge model
Node patient = new Node(null, "Patient",
Map.of("name", "John", "age", 45),
"Patient John with diabetes");
// Persist entity
indexer.indexNode(patient).block();
// Define relationship
Edge hasCondition = new Edge(null, "HAS_CONDITION",
patientId, diabetesId, null, 1.0);
indexer.indexEdge(hasCondition).block();
// Query context
SearchResult result = retriever.retrieve(
RetrievalRequest.builder()
.query("diabetes treatments")
.depth(2)
.topK(10)
.build()
);🏆 Benchmark Results
Comprehensive test suite validates Knowledge Model superiority over vector-only approaches.
Test Domains
-
Medical Clinical Decision Support
- 177 nodes: Patients, Conditions, Medications, Doctors
- 455 edges: Multi-hop relationships
- 4 complex scenarios: Contraindication discovery, behavioral patterns, cascading risks, transitive exposure
- Results: 60-220% improvement, 120% average
-
SAP ERP Organizational Structure
- 80+ records: Departments, Cost Centers, Projects, Vendors, Materials, Invoices
- 14 CSV entities with organizational hierarchies
- 2 scenarios: Cross-project resource allocation, cost center contagion analysis
- Results: 60% improvement (both scenarios), proves Graph RAG advantage for structural patterns
Key Findings
Medical Domain:
| Scenario | Vector-Only | Knowledge Model | Improvement |
|---|---|---|---|
| Multi-hop contraindication | 5 chunks | 12 chunks | +140% |
| Behavioral risk patterns | 5 chunks | 8 chunks | +60% |
| Cascading side effects | 5 chunks | 8 chunks | +60% |
| Transitive exposure risk | 5 chunks | 16 chunks | +220% 🔥 |
| Average | 5.0 chunks | 11.0 chunks | +120% |
SAP ERP Domain:
| Scenario | Vector-Only | Knowledge Model | Improvement |
|---|---|---|---|
| Cross-project resource allocation | 5 chunks | 8 chunks | +60% |
| Cost center contagion analysis | 5 chunks | 8 chunks | +60% |
| Average | 5.0 chunks | 8.0 chunks | +60% |
Cross-Domain Insight:
Graph RAG wins for structural organizational patterns (Department→CostCenter hierarchies, funding networks). Graph RAG equals vector search for semantically explicit relationships (supply chain descriptions already mentioning vendor-component connections).
Latency:
- Medical: 26ms avg (+19.5% vs vector-only 22ms)
- SAP: 43ms avg (+23.2% vs vector-only 35ms)
- Conclusion: <25% overhead acceptable for 60-220% content improvement
Visualizations:
- Medical:
cef-framework/src/test/resources/scripts/results/benchmark_comparison.png - SAP:
cef-framework/src/test/resources/scripts/results/sap_benchmark_comparison.png
See docs/EVALUATION_SUMMARY.md for detailed multi-domain analysis.
📊 Performance Characteristics
Tested Configuration: DuckDB + JGraphT + vLLM (Qwen3-Coder-30B) + Ollama (nomic-embed-text)
| Operation | Performance | Notes |
|---|---|---|
| Node indexing | <50ms per node | Single insert |
| Batch indexing | ~2s per 1000 nodes | Transactional batch |
| Graph traversal (depth 2) | <50ms | JGraphT in-memory |
| Vector search (10K chunks) | ~100ms | DuckDB brute-force |
| Hybrid assembly | ~150ms | Graph + vector combined |
| Embedding generation | ~200ms per chunk | Ollama nomic-embed-text |
Benchmark Results (Medical Domain):
- Vector-only: 60 chunks retrieved
- Knowledge Model ORM: 132 chunks retrieved (120% improvement)
- Relationship-aware context with proper entity boundaries
🔧 Configuration Example
cef:
# Storage backend
database:
type: duckdb # or postgresql
duckdb:
path: ./data/cef.duckdb
# Graph store
graph:
store: jgrapht # or neo4j
preload-on-startup: true
# Vector store
vector:
store: duckdb # or postgres, qdrant, pinecone
# LLM provider
llm:
default-provider: vllm # or ollama, openai
vllm:
base-url: http://localhost:8001
model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
# Embedding provider
embedding:
provider: ollama # or openai
model: nomic-embed-text
dimension: 768🐛 Known Limitations
- JGraphT Memory - Recommended maximum 100K nodes (~350MB)
- PostgreSQL Untested - Schema provided but not integration tested
- Concurrent Indexing -...