Skip to content

Releases: ddse-foundation/cef

Release Notes - 0.6

07 Dec 17:52

Choose a tag to compare

Release Notes

Version 0.6 (Research) — December 7, 2025

Research Edition - Production Patterns Implemented, Not Hardened

This release transforms CEF from a single-backend prototype to a multi-backend framework with production patterns. While still research-grade, v0.6 implements foundational resilience, security, and pluggability that can evolve toward production readiness.


🎯 Release Highlights

  • 5 Graph Store Backends - Neo4j, PostgreSQL AGE, PostgreSQL SQL, DuckDB, In-Memory
  • 4 Vector Store Backends - Neo4j, PostgreSQL pgvector, DuckDB VSS, In-Memory
  • Resilience Patterns - Retry, circuit breaker, timeout for embedding services
  • Security Foundations - API-key auth, input sanitization, audit logging
  • 178+ Integration Tests - Real infrastructure via Testcontainers (no mocks)
  • Docker Compose - Neo4j, PostgreSQL+pgvector, Apache AGE, MinIO

✨ New Features

Pluggable Graph Stores (IDR-004)

Store Config Value Backend Tests
Neo4jGraphStore neo4j Neo4j 5.x Community 18 tests
PgAgeGraphStore pg-age PostgreSQL + Apache AGE 18 tests
PgSqlGraphStore pg-sql Pure PostgreSQL SQL 18 tests
DuckDbGraphStore duckdb DuckDB embedded Default
InMemoryGraphStore in-memory JGraphT Development

Pluggable Vector Stores

Store Config Value Backend Notes
Neo4jChunkStore neo4j Neo4j vector indexes Unified with Neo4j graph
R2dbcChunkStore postgresql PostgreSQL pgvector Reactive R2DBC
DuckDbChunkStore duckdb DuckDB VSS Default
InMemoryChunkStore in-memory ConcurrentHashMap Development

Dual-Store Configuration

Graph and vector stores are independently configurable:

cef:
  graph:
    store: neo4j  # neo4j | pg-age | pg-sql | duckdb | in-memory
  vector:
    store: neo4j  # neo4j | postgresql | duckdb | in-memory

Tested Backend Combinations

Profile Graph Store Vector Store Use Case
in-memory in-memory in-memory Development, CI/CD
duckdb duckdb duckdb Default, embedded
neo4j neo4j neo4j Production graphs
pg-sql pg-sql postgresql Max PostgreSQL compatibility
pg-age pg-age postgresql Cypher on PostgreSQL

Resilience Infrastructure

  • CefResilienceProperties.java - Externalized configuration
  • CefResilienceAutoConfiguration.java - Auto-configuration for Resilience4j
  • ResilientEmbeddingService.java - Wrapper with retry, circuit breaker, timeout
cef:
  resilience:
    embedding:
      retry:
        max-attempts: 3
        wait-duration: 1s
      circuit-breaker:
        failure-rate-threshold: 50
      timeout: 30s

Thread Safety

  • ThreadSafeKnowledgeGraph.java - ReadWriteLock wrapper for InMemoryKnowledgeGraph
  • 21 concurrent tests including stress tests
  • Opt-in via cef.graph.thread-safe=true

Security Foundations

  • CefSecurityProperties.java - Security configuration (JWT, API-Key, OAuth2)
  • InputSanitizer.java - SQL/Cypher injection, XSS, prompt injection prevention
  • SecurityAuditLogger.java - Audit logging for security events
  • CefExceptionHandler.java - Sanitized error responses
  • 49 tests for security components

Input Validation

  • ValidatedRetrievalRequest.java - JSR-380 validated retrieval DTO
  • ValidatedNodeInput.java - JSR-380 validated node input DTO
  • ValidatedEdgeInput.java - JSR-380 validated edge input DTO
  • 29 validation tests

Configuration Hardening

  • Enhanced CefProperties.java with JSR-380 validation constraints
  • Validation for: dimension (128-4096), token budget (100-128K), batch size, cache TTL
  • 18 validation tests

Observability

  • CefHealthIndicator.java - Combined health indicator for CEF components
  • KnowledgeGraphHealthIndicator.java - Graph-specific health checks
  • CefMetrics.java - Micrometer metrics binder for graph statistics

🐳 Docker Compose Updates

New services for v0.6:

# Neo4j (Graph Store)
docker-compose up -d neo4j
# Access: http://localhost:7474 (neo4j/cef_password)

# PostgreSQL + AGE (Graph Store)
docker-compose --profile age up -d postgres-age
# Access: localhost:5433

# Full stack
docker-compose --profile age --profile minio up -d

⚠️ Known Limitations (Research Edition)

See KNOWN_ISSUES.md for complete list:

  1. Security defaults OFF - Must opt-in via cef.security.enabled=true
  2. PgAGE query safety - Manual Cypher escaping, needs parameterization
  3. Resilience coverage - Only embeddings have retry/CB/timeout
  4. Observability gaps - No health indicators for Neo4j/Pg stores

📊 Test Coverage

Category Tests Notes
Neo4j Integration 18 Testcontainers
PostgreSQL AGE 18 Testcontainers
PostgreSQL SQL 18 Testcontainers
Security 49 InputSanitizer, AuditLogger
Validation 29 JSR-380 DTOs
Thread Safety 21 Concurrent stress tests
Resilience 7 Real Ollama
Configuration 18 CefProperties validation
Total New 178+ All passing

📚 Documentation Updates

  • Updated README.md with v0.6 features
  • Updated USER_GUIDE.md with graph store selection
  • Updated QUICKSTART.md with new Docker Compose options
  • Updated ARCHITECTURE.md with storage architecture
  • Added ddse/v0.6/IDR-004.md (Implementation Decision Record)

Version beta-0.5 (November 27, 2025)

First Public Beta Release

This is the initial beta release of the Context Engineering Framework (CEF) from DDSE Foundation. CEF provides an ORM-like abstraction for LLM context engineering, managing knowledge models through dual persistence (graph + vector stores).


🎯 Release Highlights

  • ORM for Context Engineering - Define knowledge models (nodes, edges) like JPA entities
  • Dual Persistence - Automatic management of graph and vector stores
  • Intelligent Context Assembly - 3-level strategy (relationship navigation → semantic → keyword)
  • Standard Patterns - Repository layer, service patterns, lifecycle hooks
  • Comprehensive Documentation - USER_GUIDE, ARCHITECTURE, examples

Note: This is a Community/Research Release. It is optimized for ease of use and experimentation, not for high-concurrency enterprise production environments. It is ideal for "Disposable Research Pods"—ephemeral environments spun up to analyze specific datasets (e.g., financial audits, clinical trials) and then discarded.


✨ Core Features

Knowledge Model ORM

  • Entity Persistence - Node and Edge entities with JSONB properties
  • Relationship Navigation - Multi-hop graph traversal with semantic filtering
  • Vectorizable Content - Automatic embedding generation and persistence
  • RelationType System - Semantic hints (HIERARCHY, CAUSALITY, ASSOCIATION, etc.)

Storage Backends (Pluggable)

  • DuckDB - Embedded database (default, tested)
  • JGraphT - In-memory graph store (default, tested)
  • ⚠️ PostgreSQL - External database with pgvector (configured, untested)
  • ⚠️ Neo4j - Graph database for large-scale deployments (configured, untested)
  • ⚠️ Qdrant - Vector database (configured, untested)
  • ⚠️ Pinecone - Cloud vector database (configured, untested)

LLM Integration

  • vLLM - Production inference server with Qwen3-Coder-30B-A3B-Instruct-FP8 (tested)
  • Ollama Embeddings - nomic-embed-text model, 768 dimensions (tested)
  • ⚠️ OpenAI - GPT-4, GPT-3.5 Turbo (configured, untested)
  • ⚠️ Ollama LLM - Llama 3.x models (configured, untested)

Context Assembly

  • Pattern-Based Retrieval - GraphPattern with TraversalStep and Constraint
  • Multi-Hop Reasoning - Configurable depth (1-5 hops)
  • 3-Level Fallback - Graph → Hybrid → Vector-only
  • Semantic Filtering - Relationship semantics-aware traversal

Developer Experience

  • Repository Pattern - Domain-specific facades over ORM layer
  • Service Layer - Business logic separation with transaction support
  • Reactive API - Spring WebFlux + R2DBC for non-blocking I/O
  • Configuration - YAML-based with sensible defaults

📦 What's Included

Framework (cef-framework)

<dependency>
    <groupId>org.ddse.ml</groupId>
    <artifactId>cef-framework</artifactId>
    <version>beta-0.5</version>
</dependency>
  • KnowledgeIndexer - Entity persistence (like EntityManager)
  • KnowledgeRetriever - Context queries (like Repository)
  • GraphStore - Pluggable graph backend interface
  • VectorStore - Pluggable vector backend interface
  • Node, Edge, Chunk, RelationType - Core domain entities
  • GraphPattern, TraversalStep, Constraint - Query DSL

Comprehensive Test Suite

  • Medical Domain: 150 patients, 5 conditions, 7 medications, 15 doctors (177 nodes, 455 edges)
  • Financial Domain: SAP-simulated data (vendors, materials, purchase orders, invoices)
  • Benchmarks: 4 complex scenarios proving Knowledge Model superiority
  • Results: 60-220% improvement over vector-only search (see docs/EVALUATION_SUMMARY.md)

Documentation

  • USER_GUIDE.md - Complete ORM integration guide (30KB, 1,200 lines)
  • ARCHITECTURE.md - Technical deep dive
  • QUICKSTART.md - Getting started in 5 minutes
  • KNOWN_ISSUES.md - Testing status and limitations
  • README.md - Project overview

##...

Read more

Release Notes - beta-0.5

02 Dec 12:25

Choose a tag to compare

Release Notes

Version beta-0.5 (November 27, 2025)

First Public Beta Release

This is the initial beta release of the Context Engineering Framework (CEF) from DDSE Foundation. CEF provides an ORM-like abstraction for LLM context engineering, managing knowledge models through dual persistence (graph + vector stores).


🎯 Release Highlights

  • ORM for Context Engineering - Define knowledge models (nodes, edges) like JPA entities
  • Dual Persistence - Automatic management of graph and vector stores
  • Intelligent Context Assembly - 3-level strategy (relationship navigation → semantic → keyword)
  • Standard Patterns - Repository layer, service patterns, lifecycle hooks
  • Comprehensive Documentation - USER_GUIDE, ARCHITECTURE, examples

Note: This is a Community/Research Release. It is optimized for ease of use and experimentation, not for high-concurrency enterprise production environments. It is ideal for "Disposable Research Pods"—ephemeral environments spun up to analyze specific datasets (e.g., financial audits, clinical trials) and then discarded.


✨ Core Features

Knowledge Model ORM

  • Entity Persistence - Node and Edge entities with JSONB properties
  • Relationship Navigation - Multi-hop graph traversal with semantic filtering
  • Vectorizable Content - Automatic embedding generation and persistence
  • RelationType System - Semantic hints (HIERARCHY, CAUSALITY, ASSOCIATION, etc.)

Storage Backends (Pluggable)

  • DuckDB - Embedded database (default, tested)
  • JGraphT - In-memory graph store (default, tested)
  • ⚠️ PostgreSQL - External database with pgvector (configured, untested)
  • ⚠️ Neo4j - Graph database for large-scale deployments (configured, untested)
  • ⚠️ Qdrant - Vector database (configured, untested)
  • ⚠️ Pinecone - Cloud vector database (configured, untested)

LLM Integration

  • vLLM - Production inference server with Qwen3-Coder-30B-A3B-Instruct-FP8 (tested)
  • Ollama Embeddings - nomic-embed-text model, 768 dimensions (tested)
  • ⚠️ OpenAI - GPT-4, GPT-3.5 Turbo (configured, untested)
  • ⚠️ Ollama LLM - Llama 3.x models (configured, untested)

Context Assembly

  • Pattern-Based Retrieval - GraphPattern with TraversalStep and Constraint
  • Multi-Hop Reasoning - Configurable depth (1-5 hops)
  • 3-Level Fallback - Graph → Hybrid → Vector-only
  • Semantic Filtering - Relationship semantics-aware traversal

Developer Experience

  • Repository Pattern - Domain-specific facades over ORM layer
  • Service Layer - Business logic separation with transaction support
  • Reactive API - Spring WebFlux + R2DBC for non-blocking I/O
  • Configuration - YAML-based with sensible defaults

📦 What's Included

Framework (cef-framework)

<dependency>
    <groupId>org.ddse.ml</groupId>
    <artifactId>cef-framework</artifactId>
    <version>beta-0.5</version>
</dependency>
  • KnowledgeIndexer - Entity persistence (like EntityManager)
  • KnowledgeRetriever - Context queries (like Repository)
  • GraphStore - Pluggable graph backend interface
  • VectorStore - Pluggable vector backend interface
  • Node, Edge, Chunk, RelationType - Core domain entities
  • GraphPattern, TraversalStep, Constraint - Query DSL

Comprehensive Test Suite

  • Medical Domain: 150 patients, 5 conditions, 7 medications, 15 doctors (177 nodes, 455 edges)
  • Financial Domain: SAP-simulated data (vendors, materials, purchase orders, invoices)
  • Benchmarks: 4 complex scenarios proving Knowledge Model superiority
  • Results: 60-220% improvement over vector-only search (see docs/EVALUATION_SUMMARY.md)

Documentation

  • USER_GUIDE.md - Complete ORM integration guide (30KB, 1,200 lines)
  • ARCHITECTURE.md - Technical deep dive
  • QUICKSTART.md - Getting started in 5 minutes
  • KNOWN_ISSUES.md - Testing status and limitations
  • README.md - Project overview

🧪 Testing Status

Thoroughly Tested ✅

  • DuckDB embedded database
  • JGraphT in-memory graph (up to 100K nodes)
  • vLLM with Qwen3-Coder-30B-A3B-Instruct-FP8
  • Ollama embeddings (nomic-embed-text, 768 dimensions)
  • Pattern-based retrieval with multi-hop reasoning
  • Medical domain example with benchmarks

Configured but Untested ⚠️

  • PostgreSQL + pgvector
  • Neo4j graph database
  • OpenAI GPT models
  • Ollama LLM models (Llama 3.x)
  • Qdrant vector database
  • Pinecone vector database

See KNOWN_ISSUES.md for details.


🚀 Getting Started

Prerequisites

  • Java 17+
  • Maven 3.8+
  • Docker & Docker Compose

Quick Start

# Clone repository
git clone <repository-url>
cd ced

# Start services (Ollama for embeddings)
docker-compose up -d

# Note: vLLM (Qwen3-Coder-30B) required for benchmark reproduction
# See https://docs.vllm.ai/ for installation

# Build framework
mvn clean install

# Run test suite (includes benchmarks)
cd cef-framework
mvn test

# View benchmark results
open docs/benchmark_comparison.png
open docs/EVALUATION_SUMMARY.md

Example Usage

// Define knowledge model
Node patient = new Node(null, "Patient", 
    Map.of("name", "John", "age", 45), 
    "Patient John with diabetes");

// Persist entity
indexer.indexNode(patient).block();

// Define relationship
Edge hasCondition = new Edge(null, "HAS_CONDITION",
    patientId, diabetesId, null, 1.0);
indexer.indexEdge(hasCondition).block();

// Query context
SearchResult result = retriever.retrieve(
    RetrievalRequest.builder()
        .query("diabetes treatments")
        .depth(2)
        .topK(10)
        .build()
);

🏆 Benchmark Results

Comprehensive test suite validates Knowledge Model superiority over vector-only approaches.

Test Domains

  1. Medical Clinical Decision Support

    • 177 nodes: Patients, Conditions, Medications, Doctors
    • 455 edges: Multi-hop relationships
    • 4 complex scenarios: Contraindication discovery, behavioral patterns, cascading risks, transitive exposure
    • Results: 60-220% improvement, 120% average
  2. SAP ERP Organizational Structure

    • 80+ records: Departments, Cost Centers, Projects, Vendors, Materials, Invoices
    • 14 CSV entities with organizational hierarchies
    • 2 scenarios: Cross-project resource allocation, cost center contagion analysis
    • Results: 60% improvement (both scenarios), proves Graph RAG advantage for structural patterns

Key Findings

Medical Domain:

Scenario Vector-Only Knowledge Model Improvement
Multi-hop contraindication 5 chunks 12 chunks +140%
Behavioral risk patterns 5 chunks 8 chunks +60%
Cascading side effects 5 chunks 8 chunks +60%
Transitive exposure risk 5 chunks 16 chunks +220% 🔥
Average 5.0 chunks 11.0 chunks +120%

SAP ERP Domain:

Scenario Vector-Only Knowledge Model Improvement
Cross-project resource allocation 5 chunks 8 chunks +60%
Cost center contagion analysis 5 chunks 8 chunks +60%
Average 5.0 chunks 8.0 chunks +60%

Cross-Domain Insight:
Graph RAG wins for structural organizational patterns (Department→CostCenter hierarchies, funding networks). Graph RAG equals vector search for semantically explicit relationships (supply chain descriptions already mentioning vendor-component connections).

Latency:

  • Medical: 26ms avg (+19.5% vs vector-only 22ms)
  • SAP: 43ms avg (+23.2% vs vector-only 35ms)
  • Conclusion: <25% overhead acceptable for 60-220% content improvement

Visualizations:

  • Medical: cef-framework/src/test/resources/scripts/results/benchmark_comparison.png
  • SAP: cef-framework/src/test/resources/scripts/results/sap_benchmark_comparison.png

See docs/EVALUATION_SUMMARY.md for detailed multi-domain analysis.


📊 Performance Characteristics

Tested Configuration: DuckDB + JGraphT + vLLM (Qwen3-Coder-30B) + Ollama (nomic-embed-text)

Operation Performance Notes
Node indexing <50ms per node Single insert
Batch indexing ~2s per 1000 nodes Transactional batch
Graph traversal (depth 2) <50ms JGraphT in-memory
Vector search (10K chunks) ~100ms DuckDB brute-force
Hybrid assembly ~150ms Graph + vector combined
Embedding generation ~200ms per chunk Ollama nomic-embed-text

Benchmark Results (Medical Domain):

  • Vector-only: 60 chunks retrieved
  • Knowledge Model ORM: 132 chunks retrieved (120% improvement)
  • Relationship-aware context with proper entity boundaries

🔧 Configuration Example

cef:
  # Storage backend
  database:
    type: duckdb  # or postgresql
    duckdb:
      path: ./data/cef.duckdb
  
  # Graph store
  graph:
    store: jgrapht  # or neo4j
    preload-on-startup: true
  
  # Vector store  
  vector:
    store: duckdb  # or postgres, qdrant, pinecone
  
  # LLM provider
  llm:
    default-provider: vllm  # or ollama, openai
    vllm:
      base-url: http://localhost:8001
      model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
  
  # Embedding provider
  embedding:
    provider: ollama  # or openai
    model: nomic-embed-text
    dimension: 768

🐛 Known Limitations

  1. JGraphT Memory - Recommended maximum 100K nodes (~350MB)
  2. PostgreSQL Untested - Schema provided but not integration tested
  3. Concurrent Indexing -...
Read more