07 Dec 17:52

mrmanna

badaef1

Latest

Release Notes

Version 0.6 (Research) — December 7, 2025

Research Edition - Production Patterns Implemented, Not Hardened

This release transforms CEF from a single-backend prototype to a multi-backend framework with production patterns. While still research-grade, v0.6 implements foundational resilience, security, and pluggability that can evolve toward production readiness.

🎯 Release Highlights

5 Graph Store Backends - Neo4j, PostgreSQL AGE, PostgreSQL SQL, DuckDB, In-Memory
4 Vector Store Backends - Neo4j, PostgreSQL pgvector, DuckDB VSS, In-Memory
Resilience Patterns - Retry, circuit breaker, timeout for embedding services
Security Foundations - API-key auth, input sanitization, audit logging
178+ Integration Tests - Real infrastructure via Testcontainers (no mocks)
Docker Compose - Neo4j, PostgreSQL+pgvector, Apache AGE, MinIO

✨ New Features

Pluggable Graph Stores (IDR-004)

Store	Config Value	Backend	Tests
Neo4jGraphStore	`neo4j`	Neo4j 5.x Community	18 tests
PgAgeGraphStore	`pg-age`	PostgreSQL + Apache AGE	18 tests
PgSqlGraphStore	`pg-sql`	Pure PostgreSQL SQL	18 tests
DuckDbGraphStore	`duckdb`	DuckDB embedded	Default
InMemoryGraphStore	`in-memory`	JGraphT	Development

Pluggable Vector Stores

Store	Config Value	Backend	Notes
Neo4jChunkStore	`neo4j`	Neo4j vector indexes	Unified with Neo4j graph
R2dbcChunkStore	`postgresql`	PostgreSQL pgvector	Reactive R2DBC
DuckDbChunkStore	`duckdb`	DuckDB VSS	Default
InMemoryChunkStore	`in-memory`	ConcurrentHashMap	Development

Dual-Store Configuration

Graph and vector stores are independently configurable:

cef:
  graph:
    store: neo4j  # neo4j | pg-age | pg-sql | duckdb | in-memory
  vector:
    store: neo4j  # neo4j | postgresql | duckdb | in-memory

Tested Backend Combinations

Profile	Graph Store	Vector Store	Use Case
in-memory	`in-memory`	`in-memory`	Development, CI/CD
duckdb	`duckdb`	`duckdb`	Default, embedded
neo4j	`neo4j`	`neo4j`	Production graphs
pg-sql	`pg-sql`	`postgresql`	Max PostgreSQL compatibility
pg-age	`pg-age`	`postgresql`	Cypher on PostgreSQL

Resilience Infrastructure

CefResilienceProperties.java - Externalized configuration
CefResilienceAutoConfiguration.java - Auto-configuration for Resilience4j
ResilientEmbeddingService.java - Wrapper with retry, circuit breaker, timeout

cef:
  resilience:
    embedding:
      retry:
        max-attempts: 3
        wait-duration: 1s
      circuit-breaker:
        failure-rate-threshold: 50
      timeout: 30s

Thread Safety

ThreadSafeKnowledgeGraph.java - ReadWriteLock wrapper for InMemoryKnowledgeGraph
21 concurrent tests including stress tests
Opt-in via cef.graph.thread-safe=true

Security Foundations

CefSecurityProperties.java - Security configuration (JWT, API-Key, OAuth2)
InputSanitizer.java - SQL/Cypher injection, XSS, prompt injection prevention
SecurityAuditLogger.java - Audit logging for security events
CefExceptionHandler.java - Sanitized error responses
49 tests for security components

Input Validation

ValidatedRetrievalRequest.java - JSR-380 validated retrieval DTO
ValidatedNodeInput.java - JSR-380 validated node input DTO
ValidatedEdgeInput.java - JSR-380 validated edge input DTO
29 validation tests

Configuration Hardening

Enhanced CefProperties.java with JSR-380 validation constraints
Validation for: dimension (128-4096), token budget (100-128K), batch size, cache TTL
18 validation tests

Observability

CefHealthIndicator.java - Combined health indicator for CEF components
KnowledgeGraphHealthIndicator.java - Graph-specific health checks
CefMetrics.java - Micrometer metrics binder for graph statistics

🐳 Docker Compose Updates

New services for v0.6:

# Neo4j (Graph Store)
docker-compose up -d neo4j
# Access: http://localhost:7474 (neo4j/cef_password)

# PostgreSQL + AGE (Graph Store)
docker-compose --profile age up -d postgres-age
# Access: localhost:5433

# Full stack
docker-compose --profile age --profile minio up -d

⚠️ Known Limitations (Research Edition)

See KNOWN_ISSUES.md for complete list:

Security defaults OFF - Must opt-in via cef.security.enabled=true
PgAGE query safety - Manual Cypher escaping, needs parameterization
Resilience coverage - Only embeddings have retry/CB/timeout
Observability gaps - No health indicators for Neo4j/Pg stores

📊 Test Coverage

Category	Tests	Notes
Neo4j Integration	18	Testcontainers
PostgreSQL AGE	18	Testcontainers
PostgreSQL SQL	18	Testcontainers
Security	49	InputSanitizer, AuditLogger
Validation	29	JSR-380 DTOs
Thread Safety	21	Concurrent stress tests
Resilience	7	Real Ollama
Configuration	18	CefProperties validation
Total New	178+	All passing

📚 Documentation Updates

Updated README.md with v0.6 features
Updated USER_GUIDE.md with graph store selection
Updated QUICKSTART.md with new Docker Compose options
Updated ARCHITECTURE.md with storage architecture
Added ddse/v0.6/IDR-004.md (Implementation Decision Record)

Version beta-0.5 (November 27, 2025)

First Public Beta Release

This is the initial beta release of the Context Engineering Framework (CEF) from DDSE Foundation. CEF provides an ORM-like abstraction for LLM context engineering, managing knowledge models through dual persistence (graph + vector stores).

🎯 Release Highlights

ORM for Context Engineering - Define knowledge models (nodes, edges) like JPA entities
Dual Persistence - Automatic management of graph and vector stores
Intelligent Context Assembly - 3-level strategy (relationship navigation → semantic → keyword)
Standard Patterns - Repository layer, service patterns, lifecycle hooks
Comprehensive Documentation - USER_GUIDE, ARCHITECTURE, examples

Note: This is a Community/Research Release. It is optimized for ease of use and experimentation, not for high-concurrency enterprise production environments. It is ideal for "Disposable Research Pods"—ephemeral environments spun up to analyze specific datasets (e.g., financial audits, clinical trials) and then discarded.

✨ Core Features

Knowledge Model ORM

Entity Persistence - Node and Edge entities with JSONB properties
Relationship Navigation - Multi-hop graph traversal with semantic filtering
Vectorizable Content - Automatic embedding generation and persistence
RelationType System - Semantic hints (HIERARCHY, CAUSALITY, ASSOCIATION, etc.)

Storage Backends (Pluggable)

✅ DuckDB - Embedded database (default, tested)
✅ JGraphT - In-memory graph store (default, tested)
⚠️ PostgreSQL - External database with pgvector (configured, untested)
⚠️ Neo4j - Graph database for large-scale deployments (configured, untested)
⚠️ Qdrant - Vector database (configured, untested)
⚠️ Pinecone - Cloud vector database (configured, untested)

LLM Integration

✅ vLLM - Production inference server with Qwen3-Coder-30B-A3B-Instruct-FP8 (tested)
✅ Ollama Embeddings - nomic-embed-text model, 768 dimensions (tested)
⚠️ OpenAI - GPT-4, GPT-3.5 Turbo (configured, untested)
⚠️ Ollama LLM - Llama 3.x models (configured, untested)

Context Assembly

Pattern-Based Retrieval - GraphPattern with TraversalStep and Constraint
Multi-Hop Reasoning - Configurable depth (1-5 hops)
3-Level Fallback - Graph → Hybrid → Vector-only
Semantic Filtering - Relationship semantics-aware traversal

Developer Experience

Repository Pattern - Domain-specific facades over ORM layer
Service Layer - Business logic separation with transaction support
Reactive API - Spring WebFlux + R2DBC for non-blocking I/O
Configuration - YAML-based with sensible defaults

📦 What's Included

Framework (cef-framework)

<dependency>
    <groupId>org.ddse.ml</groupId>
    <artifactId>cef-framework</artifactId>
    <version>beta-0.5</version>
</dependency>

KnowledgeIndexer - Entity persistence (like EntityManager)
KnowledgeRetriever - Context queries (like Repository)
GraphStore - Pluggable graph backend interface
VectorStore - Pluggable vector backend interface
Node, Edge, Chunk, RelationType - Core domain entities
GraphPattern, TraversalStep, Constraint - Query DSL

Comprehensive Test Suite

Medical Domain: 150 patients, 5 conditions, 7 medications, 15 doctors (177 nodes, 455 edges)
Financial Domain: SAP-simulated data (vendors, materials, purchase orders, invoices)
Benchmarks: 4 complex scenarios proving Knowledge Model superiority
Results: 60-220% improvement over vector-only search (see docs/EVALUATION_SUMMARY.md)

Documentation

USER_GUIDE.md - Complete ORM integration guide (30KB, 1,200 lines)
ARCHITECTURE.md - Technical deep dive
QUICKSTART.md - Getting started in 5 minutes
KNOWN_ISSUES.md - Testing status and limitations
README.md - Project overview

##...

Assets 2

02 Dec 12:25

mrmanna

beta-0.5.0

b1dec0b

Release Notes - beta-0.5

Release Notes

Version beta-0.5 (November 27, 2025)

First Public Beta Release

🎯 Release Highlights

ORM for Context Engineering - Define knowledge models (nodes, edges) like JPA entities
Dual Persistence - Automatic management of graph and vector stores
Intelligent Context Assembly - 3-level strategy (relationship navigation → semantic → keyword)
Standard Patterns - Repository layer, service patterns, lifecycle hooks
Comprehensive Documentation - USER_GUIDE, ARCHITECTURE, examples

Note: This is a Community/Research Release. It is optimized for ease of use and experimentation, not for high-concurrency enterprise production environments. It is ideal for "Disposable Research Pods"—ephemeral environments spun up to analyze specific datasets (e.g., financial audits, clinical trials) and then discarded.

✨ Core Features

Knowledge Model ORM

Entity Persistence - Node and Edge entities with JSONB properties
Relationship Navigation - Multi-hop graph traversal with semantic filtering
Vectorizable Content - Automatic embedding generation and persistence
RelationType System - Semantic hints (HIERARCHY, CAUSALITY, ASSOCIATION, etc.)

Storage Backends (Pluggable)

✅ DuckDB - Embedded database (default, tested)
✅ JGraphT - In-memory graph store (default, tested)
⚠️ PostgreSQL - External database with pgvector (configured, untested)
⚠️ Neo4j - Graph database for large-scale deployments (configured, untested)
⚠️ Qdrant - Vector database (configured, untested)
⚠️ Pinecone - Cloud vector database (configured, untested)

LLM Integration

✅ vLLM - Production inference server with Qwen3-Coder-30B-A3B-Instruct-FP8 (tested)
✅ Ollama Embeddings - nomic-embed-text model, 768 dimensions (tested)
⚠️ OpenAI - GPT-4, GPT-3.5 Turbo (configured, untested)
⚠️ Ollama LLM - Llama 3.x models (configured, untested)

Context Assembly

Pattern-Based Retrieval - GraphPattern with TraversalStep and Constraint
Multi-Hop Reasoning - Configurable depth (1-5 hops)
3-Level Fallback - Graph → Hybrid → Vector-only
Semantic Filtering - Relationship semantics-aware traversal

Developer Experience

Repository Pattern - Domain-specific facades over ORM layer
Service Layer - Business logic separation with transaction support
Reactive API - Spring WebFlux + R2DBC for non-blocking I/O
Configuration - YAML-based with sensible defaults

📦 What's Included

Framework (cef-framework)

<dependency>
    <groupId>org.ddse.ml</groupId>
    <artifactId>cef-framework</artifactId>
    <version>beta-0.5</version>
</dependency>

KnowledgeIndexer - Entity persistence (like EntityManager)
KnowledgeRetriever - Context queries (like Repository)
GraphStore - Pluggable graph backend interface
VectorStore - Pluggable vector backend interface
Node, Edge, Chunk, RelationType - Core domain entities
GraphPattern, TraversalStep, Constraint - Query DSL

Comprehensive Test Suite

Medical Domain: 150 patients, 5 conditions, 7 medications, 15 doctors (177 nodes, 455 edges)
Financial Domain: SAP-simulated data (vendors, materials, purchase orders, invoices)
Benchmarks: 4 complex scenarios proving Knowledge Model superiority
Results: 60-220% improvement over vector-only search (see docs/EVALUATION_SUMMARY.md)

Documentation

USER_GUIDE.md - Complete ORM integration guide (30KB, 1,200 lines)
ARCHITECTURE.md - Technical deep dive
QUICKSTART.md - Getting started in 5 minutes
KNOWN_ISSUES.md - Testing status and limitations
README.md - Project overview

🧪 Testing Status

Thoroughly Tested ✅

DuckDB embedded database
JGraphT in-memory graph (up to 100K nodes)
vLLM with Qwen3-Coder-30B-A3B-Instruct-FP8
Ollama embeddings (nomic-embed-text, 768 dimensions)
Pattern-based retrieval with multi-hop reasoning
Medical domain example with benchmarks

Configured but Untested ⚠️

PostgreSQL + pgvector
Neo4j graph database
OpenAI GPT models
Ollama LLM models (Llama 3.x)
Qdrant vector database
Pinecone vector database

See KNOWN_ISSUES.md for details.

🚀 Getting Started

Prerequisites

Java 17+
Maven 3.8+
Docker & Docker Compose

Quick Start

# Clone repository
git clone <repository-url>
cd ced

# Start services (Ollama for embeddings)
docker-compose up -d

# Note: vLLM (Qwen3-Coder-30B) required for benchmark reproduction
# See https://docs.vllm.ai/ for installation

# Build framework
mvn clean install

# Run test suite (includes benchmarks)
cd cef-framework
mvn test

# View benchmark results
open docs/benchmark_comparison.png
open docs/EVALUATION_SUMMARY.md

Example Usage

// Define knowledge model
Node patient = new Node(null, "Patient", 
    Map.of("name", "John", "age", 45), 
    "Patient John with diabetes");

// Persist entity
indexer.indexNode(patient).block();

// Define relationship
Edge hasCondition = new Edge(null, "HAS_CONDITION",
    patientId, diabetesId, null, 1.0);
indexer.indexEdge(hasCondition).block();

// Query context
SearchResult result = retriever.retrieve(
    RetrievalRequest.builder()
        .query("diabetes treatments")
        .depth(2)
        .topK(10)
        .build()
);

🏆 Benchmark Results

Comprehensive test suite validates Knowledge Model superiority over vector-only approaches.

Test Domains

Medical Clinical Decision Support
- 177 nodes: Patients, Conditions, Medications, Doctors
- 455 edges: Multi-hop relationships
- 4 complex scenarios: Contraindication discovery, behavioral patterns, cascading risks, transitive exposure
- Results: 60-220% improvement, 120% average
SAP ERP Organizational Structure
- 80+ records: Departments, Cost Centers, Projects, Vendors, Materials, Invoices
- 14 CSV entities with organizational hierarchies
- 2 scenarios: Cross-project resource allocation, cost center contagion analysis
- Results: 60% improvement (both scenarios), proves Graph RAG advantage for structural patterns

Key Findings

Medical Domain:

Scenario	Vector-Only	Knowledge Model	Improvement
Multi-hop contraindication	5 chunks	12 chunks	+140%
Behavioral risk patterns	5 chunks	8 chunks	+60%
Cascading side effects	5 chunks	8 chunks	+60%
Transitive exposure risk	5 chunks	16 chunks	+220% 🔥
Average	5.0 chunks	11.0 chunks	+120%

SAP ERP Domain:

Scenario	Vector-Only	Knowledge Model	Improvement
Cross-project resource allocation	5 chunks	8 chunks	+60%
Cost center contagion analysis	5 chunks	8 chunks	+60%
Average	5.0 chunks	8.0 chunks	+60%

Cross-Domain Insight:
Graph RAG wins for structural organizational patterns (Department→CostCenter hierarchies, funding networks). Graph RAG equals vector search for semantically explicit relationships (supply chain descriptions already mentioning vendor-component connections).

Latency:

Medical: 26ms avg (+19.5% vs vector-only 22ms)
SAP: 43ms avg (+23.2% vs vector-only 35ms)
Conclusion: <25% overhead acceptable for 60-220% content improvement

Visualizations:

Medical: cef-framework/src/test/resources/scripts/results/benchmark_comparison.png
SAP: cef-framework/src/test/resources/scripts/results/sap_benchmark_comparison.png

See docs/EVALUATION_SUMMARY.md for detailed multi-domain analysis.

📊 Performance Characteristics

Tested Configuration: DuckDB + JGraphT + vLLM (Qwen3-Coder-30B) + Ollama (nomic-embed-text)

Operation	Performance	Notes
Node indexing	<50ms per node	Single insert
Batch indexing	~2s per 1000 nodes	Transactional batch
Graph traversal (depth 2)	<50ms	JGraphT in-memory
Vector search (10K chunks)	~100ms	DuckDB brute-force
Hybrid assembly	~150ms	Graph + vector combined
Embedding generation	~200ms per chunk	Ollama nomic-embed-text

Benchmark Results (Medical Domain):

Vector-only: 60 chunks retrieved
Knowledge Model ORM: 132 chunks retrieved (120% improvement)
Relationship-aware context with proper entity boundaries

🔧 Configuration Example

cef:
  # Storage backend
  database:
    type: duckdb  # or postgresql
    duckdb:
      path: ./data/cef.duckdb
  
  # Graph store
  graph:
    store: jgrapht  # or neo4j
    preload-on-startup: true
  
  # Vector store  
  vector:
    store: duckdb  # or postgres, qdrant, pinecone
  
  # LLM provider
  llm:
    default-provider: vllm  # or ollama, openai
    vllm:
      base-url: http://localhost:8001
      model: Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8
  
  # Embedding provider
  embedding:
    provider: ollama  # or openai
    model: nomic-embed-text
    dimension: 768

🐛 Known Limitations

JGraphT Memory - Recommended maximum 100K nodes (~350MB)
PostgreSQL Untested - Schema provided but not integration tested
Concurrent Indexing -...

Assets 2

Releases: ddse-foundation/cef

Release Notes - 0.6

Release Notes

Version 0.6 (Research) — December 7, 2025

🎯 Release Highlights

✨ New Features

Pluggable Graph Stores (IDR-004)

Pluggable Vector Stores

Dual-Store Configuration

Tested Backend Combinations

Resilience Infrastructure

Thread Safety

Security Foundations

Input Validation

Configuration Hardening

Observability

🐳 Docker Compose Updates

⚠️ Known Limitations (Research Edition)

📊 Test Coverage

📚 Documentation Updates

Version beta-0.5 (November 27, 2025)

🎯 Release Highlights

✨ Core Features

Knowledge Model ORM

Storage Backends (Pluggable)

LLM Integration

Context Assembly

Developer Experience

📦 What's Included

Framework (cef-framework)

Comprehensive Test Suite

Documentation

Uh oh!

Release Notes - beta-0.5

Release Notes

Version beta-0.5 (November 27, 2025)

🎯 Release Highlights

✨ Core Features

Knowledge Model ORM

Storage Backends (Pluggable)

LLM Integration

Context Assembly

Developer Experience

📦 What's Included

Framework (cef-framework)

Comprehensive Test Suite

Documentation

🧪 Testing Status

Thoroughly Tested ✅

Configured but Untested ⚠️

🚀 Getting Started

Prerequisites

Quick Start

Example Usage

🏆 Benchmark Results

Test Domains

Key Findings

📊 Performance Characteristics

🔧 Configuration Example

🐛 Known Limitations

Uh oh!