Oleg Solozobov dev404ai

Oleg Solozobov

Production AI Reliability & Operational Evidence · OEP | Agent Runtime · Observability · Evals · Replay

Platform artifacts for production AI and agent runtime systems: workflow orchestration, tool-call permission and identity records, agent-step telemetry, release manifests, eval traces, replay / reconstruction packets, rollout gates, and incident evidence.

Author of the Operational Evidence Plane (OEP) for Agentic AI - open reference architecture for the operational-evidence layer of agent runtime systems. v0.3.0 joins release manifests, runtime events, permission records, traces, evals, replay state, and reconstruction packets under stable decision_id values, with counterfactual replay across policy, cost, drift, cache, and identity metadata. Concept DOI 10.5281/zenodo.20051036; v0.3.0 archive 10.5281/zenodo.20363793.

Method spec: Decision Evidence Maturity Model (DEMM) - arXiv:2605.04093. Empirical pilot: arXiv:2605.12078.

PhD research: the Operational Evidence Plane and counterfactual replay for production AI and agent runtime systems.

Start Here

Primary proof: Operational Evidence Plane v0.3.0 - reference implementation for reconstructable agent-runtime evidence.
Reconstruction proof: Decision Trace Reconstructor - reports evidenced, partial, absent, and opaque facts in agent / automated-decision traces.
Research anchors: DEMM method preprint, agent-decision reconstructability pilot, and the publication pipeline below.

Research Preprints

Agentic AI / DEMM:

Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification - arXiv:2605.04093.
Property-Level Reconstructability of Agent Decisions: An Anchor-Level Pilot Across Vendor SDK Adapter Regimes - arXiv:2605.12078.

Operational evidence foundation:

Decision Trace Schema for Governance Evidence - arXiv:2604.09296.
Evidence Sufficiency / Delayed Ground Truth - arXiv:2604.15740.
Label-Free Governance Degradation - arXiv:2604.17836.
Governed Decisioning + Agentic - arXiv:2604.19112.
Post-Incident Decision Reconstruction - SSRN DOI 10.2139/ssrn.6457861.

Focus Areas

Production AI Reliability (release manifests, eval-to-release gates, incident evidence)
Agent Runtime / Workflow Infrastructure (tool-use workflows, state / replay, safe execution evidence)
Observability & Evals Infrastructure (eval / telemetry linkage, traces, quality loops)
Operational Evidence & Incident Reconstruction (event identity, lineage, reconstruction packets)
Agent Permissions / Identity / Policy Controls (tool-call authorization, policy lifecycle, agent-to-service evidence)
Release Gates & Reliability Engineering (canary, shadow, rollback, postmortem packets)
Platform / Control Plane Engineering (distributed services, Kubernetes, GitOps, multi-cloud)
Data & Streaming Infrastructure (events, schemas, evidence joins, delayed-label systems)

Selected Public Proof

Current strongest public proof:

operational-evidence-plane - v0.3.0 public reference implementation for production AI / agent-runtime operational evidence: release manifests, agent-step events, tool-call permission packets, operational traces, eval results, reconstruction packets, deterministic code-review demo, Bedrock translation, and counterfactual replay across policy / cost / drift / cache / identity metadata. Apache-2.0. Concept DOI: 10.5281/zenodo.20051036; v0.3.0 DOI: 10.5281/zenodo.20363793.
decision-trace-reconstructor - v0.1.0 trace reconstruction tool that reports evidenced, partial, absent, and opaque decision facts across LangSmith, OpenTelemetry, Bedrock, OpenAI Agents, Anthropic, MCP, and other adapters. Zenodo DOI: 10.5281/zenodo.19851574.

Foundational operational-evidence artifacts:

decision-event-schema - v0.3.0 JSON Schema for decision / action events and reconstruction-oriented evidence identity. Concept DOI: 10.5281/zenodo.18923177.
evidence-collector-sdk - v0.2.0 SDK for turning raw operational signals into provenance-bearing decision evidence records. Concept DOI: 10.5281/zenodo.19245404.
evidence-sufficiency-calc - v0.2.0 calculator for scoring whether available operational proof is sufficient for a decision context. Concept DOI: 10.5281/zenodo.19233930.
governance-drift-toolkit - v0.2.1 toolkit for monitoring degradation of governance evidence in delayed-label environments. Concept DOI: 10.5281/zenodo.19236417.
governance-benchmark-dataset - v0.2.0 benchmark dataset for comparing evidence-property feasibility across decision-system architectures. Concept DOI: 10.5281/zenodo.19248722.

Supporting policy-as-code project:

RuleHub - supporting Policy-as-Code ecosystem for AI / ML guardrails, policy enforcement, and reproducible evidence; currently secondary to OEP and used as a policy / agent-runtime bridge rather than the lead artifact.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oleg Solozobov dev404ai

Highlights

Block or report dev404ai

Oleg Solozobov

Start Here

Research Preprints

Focus Areas

Selected Public Proof

Agent Runtime & Operational Evidence

Cloud & Platform Engineering

Data, Streaming & Evidence Joins

Observability, Evals & Reliability

Policy, Identity & Safeguards

Pinned Loading

Uh oh!