Skip to content

issue/Incomplete Error Context Propagation from Rust Core to Python Layer #267

@rahmat-ullah

Description

@rahmat-ullah

Issue Description

When errors occur in the Rust core library and are converted to Python exceptions through the bindings layer, significant context information is lost. The current error conversion mechanism in python/src/errors.rs performs lossy transformations that discard valuable debugging information such as provider names, field names, and retry information.

Affected Components

  • Rust Core: core/src/errors.rs - Error type definitions
  • Python Layer: python/src/errors.rs - Error conversion functions
  • Integration: Error propagation between layers

Related Files

  • python/src/errors.rs (lines 129-161)
  • core/src/errors.rs (lines 14-129)
  • python/src/llm/client.rs (lines 985-1002)

Code Snippets

Current Error Conversion (Lossy)

// From python/src/errors.rs (lines 144-152)
GraphBitError::RateLimit { .. } => PythonBindingError::RateLimit {
    message: "Rate limit exceeded".to_string(),
    retry_after: None,  // ← Context lost: provider name and retry_after_seconds discarded
},
GraphBitError::Validation { message, .. } => PythonBindingError::Validation {
    message: message.clone(),
    field: "unknown".to_string(),  // ← Context lost: actual field name discarded
    value: None,
},

Rust Core Error Definition (Rich Context)

// From core/src/errors.rs (lines 100-107)
#[error("Rate limit exceeded: {provider} - retry after {retry_after_seconds}s")]
RateLimit {
    provider: String,           // ← Available in core
    retry_after_seconds: u64,   // ← Available in core
},

#[error("Validation error: {field} - {message}")]
Validation {
    field: String,              // ← Available in core
    message: String,
},

Python Binding Error Definition (Partial Context)

// From python/src/errors.rs (lines 42-46)
RateLimit {
    message: String,
    retry_after: Option<u64>,   // ← Could be populated but isn't
},

Impact Assessment

Severity: Medium

Affected Areas:

  • Debugging: Developers lose critical context when troubleshooting failures. For example, a RateLimit error loses the provider name and retry_after_seconds, making it harder to identify which provider is rate-limiting and when to retry.
  • Error Recovery: Applications cannot implement sophisticated retry logic. The retry_after field in PythonBindingError::RateLimit is always None, even though the core error contains retry_after_seconds. This forces applications to use generic retry delays instead of provider-specific recommendations.
  • Observability: Monitoring systems cannot distinguish between different error scenarios. A Validation error always shows field: "unknown" instead of the actual field name from the core error.
  • User Experience: Generic error messages reduce clarity for end users. Error messages don't include provider context that would help users understand which service failed.
  • Production Debugging: In production, logs show generic messages like "Rate limit exceeded" instead of "Rate limit exceeded: openai - retry after 30s", making it harder to correlate errors with specific providers.

Potential Solutions

Recommendation 1: Populate All Available Fields in Error Conversion

The most straightforward fix is to extract and preserve all available context when converting from GraphBitError to PythonBindingError. Specific improvements:

  • RateLimit errors: Extract provider and retry_after_seconds from the core error and populate the retry_after field in the Python error
  • Validation errors: Extract the field name from the core error instead of using "unknown"
  • Authentication errors: Extract and populate the provider field
  • LlmProvider errors: Create a new Python error variant or extend existing ones to capture provider-specific context

Example fix for RateLimit:

GraphBitError::RateLimit { provider, retry_after_seconds } => PythonBindingError::RateLimit {
    message: format!("Rate limit exceeded: {} - retry after {}s", provider, retry_after_seconds),
    retry_after: Some(retry_after_seconds),  // ← Now populated
},

Recommendation 2: Add Provider Context to Python Error Types

Extend PythonBindingError variants to include provider information where applicable:

  • Add provider: Option<String> field to RateLimit variant
  • Add provider: Option<String> field to LlmProvider variant (if created)
  • Update the Display implementation to include provider context in error messages

Recommendation 3: Implement Error Metadata Accessor

Add methods to PythonBindingError that allow Python code to programmatically access error context:

impl PythonBindingError {
    pub fn get_provider(&self) -> Option<String> { ... }
    pub fn get_retry_after(&self) -> Option<u64> { ... }
    pub fn get_field(&self) -> Option<String> { ... }
}

Recommendation 4: Add Comprehensive Error Logging

Ensure that the full error context is logged before conversion (already done at line 133), and consider adding structured logging that captures all error fields for observability systems.

Current Strengths

The error handling infrastructure is well-designed:

  • ✅ Core errors capture rich context (provider names, field names, retry information)
  • ✅ Python binding errors have fields to hold this context (e.g., retry_after: Option<u64>)
  • ✅ Error logging is comprehensive (line 133 logs full error before conversion)
  • ✅ Display implementation for Python errors is well-structured and includes context when available
  • ✅ Error conversion function exists and is centralized (single point of improvement)

Verification Notes

Issue Confirmed: The code at lines 144-152 in python/src/errors.rs demonstrates the context loss:

  • Line 144-147: RateLimit error discards provider and retry_after_seconds
  • Line 148-152: Validation error discards the actual field name
  • Line 140-143: Authentication error discards provider context

Positive Finding: The PythonBindingError enum already has fields to hold this context (e.g., retry_after: Option<u64> at line 46), so the fix only requires populating these fields during conversion.

Notes

This issue is particularly important for production deployments where detailed error information is essential for monitoring, alerting, and automated recovery mechanisms. The good news is that the core library already captures all necessary context—it's just not being propagated to Python. The fix is straightforward and low-risk, requiring only changes to the error conversion logic in python/src/errors.rs (lines 129-161).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions